Info needed -

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Info needed -

Deepak_Kulkarni

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Ravindra Pesala
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Deepak_Kulkarni
Thx. However we want to tame sure that the queries are using pre-aggregate datamaps . We tried using explain plan but the output showed in this link https://carbondata.apache.org/datamap-management.html we are not getting.

Can you help?

BR,
Deepak

On Tue, Jul 9, 2019 at 8:51 AM Ravindra Pesala <[hidden email]> wrote:
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi


--
Regards,
Deepak
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Ravindra Pesala
Hi,

You want to use pre-aggregate datamaps, please try the example `org.apache.carbondata.examples.PreAggregateDataMapExample` to understand how it works. Please note that we are going to obsolete pre-aggregate datamaps from next version and make MV(materialized view ) datamaps going to replace it.

Regards,
Ravindra.

On Thu, 11 Jul 2019 at 17:40, Deepak Kulkarni <[hidden email]> wrote:
Thx. However we want to tame sure that the queries are using pre-aggregate datamaps . We tried using explain plan but the output showed in this link https://carbondata.apache.org/datamap-management.html we are not getting.

Can you help?

BR,
Deepak

On Tue, Jul 9, 2019 at 8:51 AM Ravindra Pesala <[hidden email]> wrote:
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Deepak_Kulkarni
thx. We tried this example but we did not get the similar output for EXPLAIN PLAN command. Can yo help?
i

On Fri, Jul 12, 2019 at 4:08 PM Ravindra Pesala <[hidden email]> wrote:
Hi,

You want to use pre-aggregate datamaps, please try the example `org.apache.carbondata.examples.PreAggregateDataMapExample` to understand how it works. Please note that we are going to obsolete pre-aggregate datamaps from next version and make MV(materialized view ) datamaps going to replace it.

Regards,
Ravindra.

On Thu, 11 Jul 2019 at 17:40, Deepak Kulkarni <[hidden email]> wrote:
Thx. However we want to tame sure that the queries are using pre-aggregate datamaps . We tried using explain plan but the output showed in this link https://carbondata.apache.org/datamap-management.html we are not getting.

Can you help?

BR,
Deepak

On Tue, Jul 9, 2019 at 8:51 AM Ravindra Pesala <[hidden email]> wrote:
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Ravindra Pesala
Hi,

Please provide the script/test case  you are using, I can try it.

Regards,
Ravindra

On Fri, 12 Jul 2019 at 16:55, Deepak Kulkarni <[hidden email]> wrote:
thx. We tried this example but we did not get the similar output for EXPLAIN PLAN command. Can yo help?
i

On Fri, Jul 12, 2019 at 4:08 PM Ravindra Pesala <[hidden email]> wrote:
Hi,

You want to use pre-aggregate datamaps, please try the example `org.apache.carbondata.examples.PreAggregateDataMapExample` to understand how it works. Please note that we are going to obsolete pre-aggregate datamaps from next version and make MV(materialized view ) datamaps going to replace it.

Regards,
Ravindra.

On Thu, 11 Jul 2019 at 17:40, Deepak Kulkarni <[hidden email]> wrote:
Thx. However we want to tame sure that the queries are using pre-aggregate datamaps . We tried using explain plan but the output showed in this link https://carbondata.apache.org/datamap-management.html we are not getting.

Can you help?

BR,
Deepak

On Tue, Jul 9, 2019 at 8:51 AM Ravindra Pesala <[hidden email]> wrote:
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Deepak_Kulkarni
Hello,

This is what we have been doing -

Please find below steps that we followed to create tables and datamap and tried explain command to see if we can get some information about the use of datamap while query execution but we did not get the expected output :-

 

  1. Table creation:

carbon.sql("CREATE TABLE MTRC00_NRAW_000_RECENT_1572019_2 (IDX_METRIC int, IDX_RESOURCE int, DTE_DATE int, DBL_VALUE int) stored by 'org.apache.carbondata.format'").show()

2019-07-15 23:03:40 AUDIT audit:72 - {"time":"July 15, 2019 11:03:40 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174429655716046","opStatus":"START"}

2019-07-15 23:03:40 WARN  HiveExternalCatalog:66 - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`mtrc00_nraw_000_recent_1572019_2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

2019-07-15 23:03:40 AUDIT audit:93 - {"time":"July 15, 2019 11:03:40 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174429655716046","opStatus":"SUCCESS","opTime":"87 ms","table":"default.mtrc00_nraw_000_recent_1572019_2","extraInfo":{"bad_record_path":"","local_dictionary_enable":"true","external":"false","sort_columns":"","comment":""}}

++

||

++

++

  1. DataMap Creation:

 

scala> carbon.sql("CREATE DATAMAP DM_preaggregate_new on table MTRC00_NRAW_000_RECENT_1572019_2 USING 'preaggregate' AS select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE")

2019-07-15 23:05:35 AUDIT audit:72 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE DATAMAP","opId":"11174544343554709","opStatus":"START"}

2019-07-15 23:05:35 AUDIT audit:72 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174544390284045","opStatus":"START"}

2019-07-15 23:05:35 WARN  HiveExternalCatalog:66 - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

2019-07-15 23:05:35 AUDIT audit:93 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174544390284045","opStatus":"SUCCESS","opTime":"96 ms","table":"default.mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new","extraInfo":{"local_dictionary_threshold":"10000","bad_record_path":"","table_blocksize":"1024","sort_scope":"LOCAL_SORT","local_dictionary_enable":"true","flat_folder":"false","external":"false","sort_columns":"","comment":"","_internal.deferred.rebuild":"false"}}

2019-07-15 23:05:35 AUDIT audit:93 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE DATAMAP","opId":"11174544343554709","opStatus":"SUCCESS","opTime":"231 ms","table":"default.mtrc00_nraw_000_recent_1572019_2","extraInfo":{"provider":"preaggregate","dmName":"DM_preaggregate_new"}}

res121: org.apache.spark.sql.DataFrame = []

 

  1. Data Loading:

scala> carbon.sql("LOAD DATA inpath 'hdfs://10.46.40.149:9000/root/test/Wireline/NRAW/MTRC00_NRAW_000_RECENT.csv' into table MTRC00_NRAW_000_RECENT_1572019_2  options('BAD_RECORDS_ACTION'='FORCE','DELIMITER'=',','FILEHEADER'='IDX_METRIC,IDX_RESOURCE,DTE_DATE,DBL_VALUE')");

2019-07-15 23:08:50 AUDIT audit:72 - {"time":"July 15, 2019 11:08:50 PM IST","username":"root","opName":"LOAD DATA","opId":"11174739779154705","opStatus":"START"}

2019-07-15 23:08:51 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-15 23:08:51 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11174740740416991

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      6 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   6 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

  1. Query Execution with EXPLAIN command:-

 

scala> carbon.sql("EXPLAIN select groupNumber, resourceNumber,SUM(VsumM1) as VsumM1, SUM(VcountM1) as VcountM1, SUM(VsumM2) as VsumM2, SUM(VcountM2) as VcountM2, SUM(VsumM3) as VsumM3, SUM(VcountM3) as VcountM3, SUM(VsumM4) as VsumM4, SUM(VcountM4) as VcountM4, SUM(VsumM5) as VsumM5, SUM(VcountM5) as VcountM5,(SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2210 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M1NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2211 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M2NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2214 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M3NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2215 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M4NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2234 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M5NAME FROM (select S1.IDX_GROUP as groupNumber ,M1.idx_resource as resourceNumber ,CASE when m1.idx_metric=2210 then SUM(M1.dbl_value) end as VsumM1,CASE when m1.idx_metric=2210 then count(m1.dbl_value) end as VcountM1,CASE when m1.idx_metric=2211 then SUM(M1.dbl_value) end as VsumM2,CASE when m1.idx_metric=2211 then count(m1.dbl_value) end as VcountM2,CASE when m1.idx_metric=2214 then SUM(M1.dbl_value) end as VsumM3,CASE when m1.idx_metric=2214 then count(m1.dbl_value) end as VcountM3,CASE when m1.idx_metric=2215 then SUM(M1.dbl_value) end as VsumM4,CASE when m1.idx_metric=2215 then count(m1.dbl_value) end as VcountM4,CASE when m1.idx_metric=2234 then SUM(M1.dbl_value) end as VsumM5,CASE when m1.idx_metric=2234 then count(m1.dbl_value) end as VcountM5 FROM MTRC00_NRAW_000_RECENT_1572019_2  M1, SE_GRP_MEMBER_HIST S1 where M1.idx_resource=S1.IDX_ALIAS_INST and S1.IDX_GROUP=200075906 and M1.idx_resource= 200071954 and M1.IDX_METRIC in (2210,2211,2214,2215,2234) group by M1.IDX_METRIC,M1.idx_resource,S1.IDX_GROUP) GROUP BY groupNumber,resourceNumber HAVING SUM(VcountM1) <> 0 OR SUM(VcountM2) <> 0 OR SUM(VcountM3) <> 0 OR SUM(VcountM4) <> 0 OR SUM(VcountM5) <> 0").show()

2019-07-16 12:33:25 AUDIT audit:72 - {"time":"July 16, 2019 12:33:25 PM IST","username":"root","opName":"EXPLAIN","opId":"11223014419086786","opStatus":"START"}

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 2

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014841650012

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     10 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  10 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014952642509

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014986644062

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014913695441

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223015020278657

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     11 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  11 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014881878925

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014903200116

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      1 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014976273697

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      4 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223015007180119

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      4 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014939414859

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      6 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   6 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014868807332

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   3 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:27 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014828073061

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   3 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:27 AUDIT audit:93 - {"time":"July 16, 2019 12:33:27 PM IST","username":"root","opName":"EXPLAIN","opId":"11223014419086786","opStatus":"SUCCESS","opTime":"2570 ms","table":"NA","extraInfo":{"query":"'Filter (((NOT ('SUM('VcountM1) = 0) || NOT ('SUM('VcountM2) = 0)) || NOT ('SUM('VcountM3) = 0)) || (NOT ('SUM('VcountM4) = 0) || NOT ('SUM('VcountM5) = 0)))"}}

+--------------------+

|                plan|

+--------------------+

|== CarbonData Pro...|

|== Physical Plan ...|

+--------------------+

  1. Even with simple select query we do not see any reference of datamp being getting hit while execution:-

 

scala> carbon.sql("EXPLAIN select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE").show()

2019-07-16 12:37:47 AUDIT audit:72 - {"time":"July 16, 2019 12:37:47 PM IST","username":"root","opName":"EXPLAIN","opId":"11223276220632808","opStatus":"START"}

2019-07-16 12:37:47 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:37:47 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223276310826076

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     16 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  16 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:37:47 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:37:47 AUDIT audit:93 - {"time":"July 16, 2019 12:37:47 PM IST","username":"root","opName":"EXPLAIN","opId":"11223276220632808","opStatus":"SUCCESS","opTime":"288 ms","table":"NA","extraInfo":{"query":"'Aggregate ['IDX_METRIC, 'IDX_RESOURCE], ['IDX_METRIC, 'IDX_RESOURCE, unresolvedalias('sum('dbl_value), None), unresolvedalias('avg('dbl_value), None), unresolvedalias('min('dbl_value), None), unresolvedalias('max('dbl_value), None), unresolvedalias('count('dbl_value), None)]"}}

+--------------------+

|                plan|

+--------------------+

|== CarbonData Pro...|

|== Physical Plan ...|

+--------------------+

The required carbon property  enable.query.statistics = true has been set .


BR,

Deepak 


On Mon, Jul 15, 2019 at 11:24 AM Ravindra Pesala <[hidden email]> wrote:
Hi,

Please provide the script/test case  you are using, I can try it.

Regards,
Ravindra

On Fri, 12 Jul 2019 at 16:55, Deepak Kulkarni <[hidden email]> wrote:
thx. We tried this example but we did not get the similar output for EXPLAIN PLAN command. Can yo help?
i

On Fri, Jul 12, 2019 at 4:08 PM Ravindra Pesala <[hidden email]> wrote:
Hi,

You want to use pre-aggregate datamaps, please try the example `org.apache.carbondata.examples.PreAggregateDataMapExample` to understand how it works. Please note that we are going to obsolete pre-aggregate datamaps from next version and make MV(materialized view ) datamaps going to replace it.

Regards,
Ravindra.

On Thu, 11 Jul 2019 at 17:40, Deepak Kulkarni <[hidden email]> wrote:
Thx. However we want to tame sure that the queries are using pre-aggregate datamaps . We tried using explain plan but the output showed in this link https://carbondata.apache.org/datamap-management.html we are not getting.

Can you help?

BR,
Deepak

On Tue, Jul 9, 2019 at 8:51 AM Ravindra Pesala <[hidden email]> wrote:
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Ravindra Pesala
Hi Deepak,

I have verified your query and pre-aggregate works as expected. 

spark.sql("DROP TABLE IF EXISTS MTRC00_NRAW_000_RECENT_1572019_2")
spark.sql("CREATE TABLE MTRC00_NRAW_000_RECENT_1572019_2 (IDX_METRIC int, IDX_RESOURCE int, DTE_DATE int, DBL_VALUE int) stored by 'org.apache.carbondata.format'")

spark.sql("CREATE DATAMAP DM_preaggregate_new on table MTRC00_NRAW_000_RECENT_1572019_2 USING 'preaggregate' AS select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE")

// show datamap
spark.sql("show datamap on table MTRC00_NRAW_000_RECENT_1572019_2").show(false)

spark.sql("insert into MTRC00_NRAW_000_RECENT_1572019_2 select 1,1,1,1")

spark.sql("EXPLAIN select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE").show(false)

Output
===========
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|== Physical Plan ==
*(2) HashAggregate(keys=[mtrc00_nraw_000_recent_1572019_2_idx_metric#380, mtrc00_nraw_000_recent_1572019_2_idx_resource#381], functions=[sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L), sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L as double)), sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L as double)), min(mtrc00_nraw_000_recent_1572019_2_dbl_value_min#384), max(mtrc00_nraw_000_recent_1572019_2_dbl_value_max#385), sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L)])
+- Exchange hashpartitioning(mtrc00_nraw_000_recent_1572019_2_idx_metric#380, mtrc00_nraw_000_recent_1572019_2_idx_resource#381, 200)
   +- *(1) HashAggregate(keys=[mtrc00_nraw_000_recent_1572019_2_idx_metric#380, mtrc00_nraw_000_recent_1572019_2_idx_resource#381], functions=[partial_sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L), partial_sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L as double)), partial_sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L as double)), partial_min(mtrc00_nraw_000_recent_1572019_2_dbl_value_min#384), partial_max(mtrc00_nraw_000_recent_1572019_2_dbl_value_max#385), partial_sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L)])
      +- *(1) FileScan carbondata default.mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new[mtrc00_nraw_000_recent_1572019_2_idx_metric#380,mtrc00_nraw_000_recent_1572019_2_idx_resource#381,mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L,mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L,mtrc00_nraw_000_recent_1572019_2_dbl_value_min#384,mtrc00_nraw_000_recent_1572019_2_dbl_value_max#385]|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In the above explain plan clearly shows that it has choosen pre-aggregate table. I am attaching the example scala file for your reference.


Regards,
Ravindra.

On Tue, 16 Jul 2019 at 15:28, Deepak Kulkarni <[hidden email]> wrote:
Hello,

This is what we have been doing -

Please find below steps that we followed to create tables and datamap and tried explain command to see if we can get some information about the use of datamap while query execution but we did not get the expected output :-

 

  1. Table creation:

carbon.sql("CREATE TABLE MTRC00_NRAW_000_RECENT_1572019_2 (IDX_METRIC int, IDX_RESOURCE int, DTE_DATE int, DBL_VALUE int) stored by 'org.apache.carbondata.format'").show()

2019-07-15 23:03:40 AUDIT audit:72 - {"time":"July 15, 2019 11:03:40 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174429655716046","opStatus":"START"}

2019-07-15 23:03:40 WARN  HiveExternalCatalog:66 - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`mtrc00_nraw_000_recent_1572019_2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

2019-07-15 23:03:40 AUDIT audit:93 - {"time":"July 15, 2019 11:03:40 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174429655716046","opStatus":"SUCCESS","opTime":"87 ms","table":"default.mtrc00_nraw_000_recent_1572019_2","extraInfo":{"bad_record_path":"","local_dictionary_enable":"true","external":"false","sort_columns":"","comment":""}}

++

||

++

++

  1. DataMap Creation:

 

scala> carbon.sql("CREATE DATAMAP DM_preaggregate_new on table MTRC00_NRAW_000_RECENT_1572019_2 USING 'preaggregate' AS select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE")

2019-07-15 23:05:35 AUDIT audit:72 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE DATAMAP","opId":"11174544343554709","opStatus":"START"}

2019-07-15 23:05:35 AUDIT audit:72 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174544390284045","opStatus":"START"}

2019-07-15 23:05:35 WARN  HiveExternalCatalog:66 - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

2019-07-15 23:05:35 AUDIT audit:93 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174544390284045","opStatus":"SUCCESS","opTime":"96 ms","table":"default.mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new","extraInfo":{"local_dictionary_threshold":"10000","bad_record_path":"","table_blocksize":"1024","sort_scope":"LOCAL_SORT","local_dictionary_enable":"true","flat_folder":"false","external":"false","sort_columns":"","comment":"","_internal.deferred.rebuild":"false"}}

2019-07-15 23:05:35 AUDIT audit:93 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE DATAMAP","opId":"11174544343554709","opStatus":"SUCCESS","opTime":"231 ms","table":"default.mtrc00_nraw_000_recent_1572019_2","extraInfo":{"provider":"preaggregate","dmName":"DM_preaggregate_new"}}

res121: org.apache.spark.sql.DataFrame = []

 

  1. Data Loading:

scala> carbon.sql("LOAD DATA inpath 'hdfs://10.46.40.149:9000/root/test/Wireline/NRAW/MTRC00_NRAW_000_RECENT.csv' into table MTRC00_NRAW_000_RECENT_1572019_2  options('BAD_RECORDS_ACTION'='FORCE','DELIMITER'=',','FILEHEADER'='IDX_METRIC,IDX_RESOURCE,DTE_DATE,DBL_VALUE')");

2019-07-15 23:08:50 AUDIT audit:72 - {"time":"July 15, 2019 11:08:50 PM IST","username":"root","opName":"LOAD DATA","opId":"11174739779154705","opStatus":"START"}

2019-07-15 23:08:51 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-15 23:08:51 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11174740740416991

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      6 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   6 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

  1. Query Execution with EXPLAIN command:-

 

scala> carbon.sql("EXPLAIN select groupNumber, resourceNumber,SUM(VsumM1) as VsumM1, SUM(VcountM1) as VcountM1, SUM(VsumM2) as VsumM2, SUM(VcountM2) as VcountM2, SUM(VsumM3) as VsumM3, SUM(VcountM3) as VcountM3, SUM(VsumM4) as VsumM4, SUM(VcountM4) as VcountM4, SUM(VsumM5) as VsumM5, SUM(VcountM5) as VcountM5,(SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2210 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M1NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2211 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M2NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2214 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M3NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2215 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M4NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2234 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M5NAME FROM (select S1.IDX_GROUP as groupNumber ,M1.idx_resource as resourceNumber ,CASE when m1.idx_metric=2210 then SUM(M1.dbl_value) end as VsumM1,CASE when m1.idx_metric=2210 then count(m1.dbl_value) end as VcountM1,CASE when m1.idx_metric=2211 then SUM(M1.dbl_value) end as VsumM2,CASE when m1.idx_metric=2211 then count(m1.dbl_value) end as VcountM2,CASE when m1.idx_metric=2214 then SUM(M1.dbl_value) end as VsumM3,CASE when m1.idx_metric=2214 then count(m1.dbl_value) end as VcountM3,CASE when m1.idx_metric=2215 then SUM(M1.dbl_value) end as VsumM4,CASE when m1.idx_metric=2215 then count(m1.dbl_value) end as VcountM4,CASE when m1.idx_metric=2234 then SUM(M1.dbl_value) end as VsumM5,CASE when m1.idx_metric=2234 then count(m1.dbl_value) end as VcountM5 FROM MTRC00_NRAW_000_RECENT_1572019_2  M1, SE_GRP_MEMBER_HIST S1 where M1.idx_resource=S1.IDX_ALIAS_INST and S1.IDX_GROUP=200075906 and M1.idx_resource= 200071954 and M1.IDX_METRIC in (2210,2211,2214,2215,2234) group by M1.IDX_METRIC,M1.idx_resource,S1.IDX_GROUP) GROUP BY groupNumber,resourceNumber HAVING SUM(VcountM1) <> 0 OR SUM(VcountM2) <> 0 OR SUM(VcountM3) <> 0 OR SUM(VcountM4) <> 0 OR SUM(VcountM5) <> 0").show()

2019-07-16 12:33:25 AUDIT audit:72 - {"time":"July 16, 2019 12:33:25 PM IST","username":"root","opName":"EXPLAIN","opId":"11223014419086786","opStatus":"START"}

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 2

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014841650012

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     10 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  10 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014952642509

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014986644062

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014913695441

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223015020278657

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     11 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  11 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014881878925

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014903200116

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      1 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014976273697

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      4 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223015007180119

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      4 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014939414859

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      6 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   6 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014868807332

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   3 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:27 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014828073061

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   3 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:27 AUDIT audit:93 - {"time":"July 16, 2019 12:33:27 PM IST","username":"root","opName":"EXPLAIN","opId":"11223014419086786","opStatus":"SUCCESS","opTime":"2570 ms","table":"NA","extraInfo":{"query":"'Filter (((NOT ('SUM('VcountM1) = 0) || NOT ('SUM('VcountM2) = 0)) || NOT ('SUM('VcountM3) = 0)) || (NOT ('SUM('VcountM4) = 0) || NOT ('SUM('VcountM5) = 0)))"}}

+--------------------+

|                plan|

+--------------------+

|== CarbonData Pro...|

|== Physical Plan ...|

+--------------------+

  1. Even with simple select query we do not see any reference of datamp being getting hit while execution:-

 

scala> carbon.sql("EXPLAIN select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE").show()

2019-07-16 12:37:47 AUDIT audit:72 - {"time":"July 16, 2019 12:37:47 PM IST","username":"root","opName":"EXPLAIN","opId":"11223276220632808","opStatus":"START"}

2019-07-16 12:37:47 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:37:47 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223276310826076

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     16 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  16 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:37:47 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:37:47 AUDIT audit:93 - {"time":"July 16, 2019 12:37:47 PM IST","username":"root","opName":"EXPLAIN","opId":"11223276220632808","opStatus":"SUCCESS","opTime":"288 ms","table":"NA","extraInfo":{"query":"'Aggregate ['IDX_METRIC, 'IDX_RESOURCE], ['IDX_METRIC, 'IDX_RESOURCE, unresolvedalias('sum('dbl_value), None), unresolvedalias('avg('dbl_value), None), unresolvedalias('min('dbl_value), None), unresolvedalias('max('dbl_value), None), unresolvedalias('count('dbl_value), None)]"}}

+--------------------+

|                plan|

+--------------------+

|== CarbonData Pro...|

|== Physical Plan ...|

+--------------------+

The required carbon property  enable.query.statistics = true has been set .


BR,

Deepak 


On Mon, Jul 15, 2019 at 11:24 AM Ravindra Pesala <[hidden email]> wrote:
Hi,

Please provide the script/test case  you are using, I can try it.

Regards,
Ravindra

On Fri, 12 Jul 2019 at 16:55, Deepak Kulkarni <[hidden email]> wrote:
thx. We tried this example but we did not get the similar output for EXPLAIN PLAN command. Can yo help?
i

On Fri, Jul 12, 2019 at 4:08 PM Ravindra Pesala <[hidden email]> wrote:
Hi,

You want to use pre-aggregate datamaps, please try the example `org.apache.carbondata.examples.PreAggregateDataMapExample` to understand how it works. Please note that we are going to obsolete pre-aggregate datamaps from next version and make MV(materialized view ) datamaps going to replace it.

Regards,
Ravindra.

On Thu, 11 Jul 2019 at 17:40, Deepak Kulkarni <[hidden email]> wrote:
Thx. However we want to tame sure that the queries are using pre-aggregate datamaps . We tried using explain plan but the output showed in this link https://carbondata.apache.org/datamap-management.html we are not getting.

Can you help?

BR,
Deepak

On Tue, Jul 9, 2019 at 8:51 AM Ravindra Pesala <[hidden email]> wrote:
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi

PreAggregateDataMapExample.scala (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Info needed -

Deepak_Kulkarni
Thanks so much, Ravindra, appreciated.

On Thu, Jul 18, 2019 at 8:03 PM Ravindra Pesala <[hidden email]> wrote:
Hi Deepak,

I have verified your query and pre-aggregate works as expected. 

spark.sql("DROP TABLE IF EXISTS MTRC00_NRAW_000_RECENT_1572019_2")
spark.sql("CREATE TABLE MTRC00_NRAW_000_RECENT_1572019_2 (IDX_METRIC int, IDX_RESOURCE int, DTE_DATE int, DBL_VALUE int) stored by 'org.apache.carbondata.format'")

spark.sql("CREATE DATAMAP DM_preaggregate_new on table MTRC00_NRAW_000_RECENT_1572019_2 USING 'preaggregate' AS select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE")

// show datamap
spark.sql("show datamap on table MTRC00_NRAW_000_RECENT_1572019_2").show(false)

spark.sql("insert into MTRC00_NRAW_000_RECENT_1572019_2 select 1,1,1,1")

spark.sql("EXPLAIN select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE").show(false)

Output
===========
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|== Physical Plan ==
*(2) HashAggregate(keys=[mtrc00_nraw_000_recent_1572019_2_idx_metric#380, mtrc00_nraw_000_recent_1572019_2_idx_resource#381], functions=[sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L), sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L as double)), sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L as double)), min(mtrc00_nraw_000_recent_1572019_2_dbl_value_min#384), max(mtrc00_nraw_000_recent_1572019_2_dbl_value_max#385), sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L)])
+- Exchange hashpartitioning(mtrc00_nraw_000_recent_1572019_2_idx_metric#380, mtrc00_nraw_000_recent_1572019_2_idx_resource#381, 200)
   +- *(1) HashAggregate(keys=[mtrc00_nraw_000_recent_1572019_2_idx_metric#380, mtrc00_nraw_000_recent_1572019_2_idx_resource#381], functions=[partial_sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L), partial_sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L as double)), partial_sum(cast(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L as double)), partial_min(mtrc00_nraw_000_recent_1572019_2_dbl_value_min#384), partial_max(mtrc00_nraw_000_recent_1572019_2_dbl_value_max#385), partial_sum(mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L)])
      +- *(1) FileScan carbondata default.mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new[mtrc00_nraw_000_recent_1572019_2_idx_metric#380,mtrc00_nraw_000_recent_1572019_2_idx_resource#381,mtrc00_nraw_000_recent_1572019_2_dbl_value_sum#382L,mtrc00_nraw_000_recent_1572019_2_dbl_value_count#383L,mtrc00_nraw_000_recent_1572019_2_dbl_value_min#384,mtrc00_nraw_000_recent_1572019_2_dbl_value_max#385]|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In the above explain plan clearly shows that it has choosen pre-aggregate table. I am attaching the example scala file for your reference.


Regards,
Ravindra.

On Tue, 16 Jul 2019 at 15:28, Deepak Kulkarni <[hidden email]> wrote:
Hello,

This is what we have been doing -

Please find below steps that we followed to create tables and datamap and tried explain command to see if we can get some information about the use of datamap while query execution but we did not get the expected output :-

 

  1. Table creation:

carbon.sql("CREATE TABLE MTRC00_NRAW_000_RECENT_1572019_2 (IDX_METRIC int, IDX_RESOURCE int, DTE_DATE int, DBL_VALUE int) stored by 'org.apache.carbondata.format'").show()

2019-07-15 23:03:40 AUDIT audit:72 - {"time":"July 15, 2019 11:03:40 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174429655716046","opStatus":"START"}

2019-07-15 23:03:40 WARN  HiveExternalCatalog:66 - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`mtrc00_nraw_000_recent_1572019_2` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

2019-07-15 23:03:40 AUDIT audit:93 - {"time":"July 15, 2019 11:03:40 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174429655716046","opStatus":"SUCCESS","opTime":"87 ms","table":"default.mtrc00_nraw_000_recent_1572019_2","extraInfo":{"bad_record_path":"","local_dictionary_enable":"true","external":"false","sort_columns":"","comment":""}}

++

||

++

++

  1. DataMap Creation:

 

scala> carbon.sql("CREATE DATAMAP DM_preaggregate_new on table MTRC00_NRAW_000_RECENT_1572019_2 USING 'preaggregate' AS select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE")

2019-07-15 23:05:35 AUDIT audit:72 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE DATAMAP","opId":"11174544343554709","opStatus":"START"}

2019-07-15 23:05:35 AUDIT audit:72 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174544390284045","opStatus":"START"}

2019-07-15 23:05:35 WARN  HiveExternalCatalog:66 - Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

2019-07-15 23:05:35 AUDIT audit:93 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE TABLE","opId":"11174544390284045","opStatus":"SUCCESS","opTime":"96 ms","table":"default.mtrc00_nraw_000_recent_1572019_2_dm_preaggregate_new","extraInfo":{"local_dictionary_threshold":"10000","bad_record_path":"","table_blocksize":"1024","sort_scope":"LOCAL_SORT","local_dictionary_enable":"true","flat_folder":"false","external":"false","sort_columns":"","comment":"","_internal.deferred.rebuild":"false"}}

2019-07-15 23:05:35 AUDIT audit:93 - {"time":"July 15, 2019 11:05:35 PM IST","username":"root","opName":"CREATE DATAMAP","opId":"11174544343554709","opStatus":"SUCCESS","opTime":"231 ms","table":"default.mtrc00_nraw_000_recent_1572019_2","extraInfo":{"provider":"preaggregate","dmName":"DM_preaggregate_new"}}

res121: org.apache.spark.sql.DataFrame = []

 

  1. Data Loading:

scala> carbon.sql("LOAD DATA inpath 'hdfs://10.46.40.149:9000/root/test/Wireline/NRAW/MTRC00_NRAW_000_RECENT.csv' into table MTRC00_NRAW_000_RECENT_1572019_2  options('BAD_RECORDS_ACTION'='FORCE','DELIMITER'=',','FILEHEADER'='IDX_METRIC,IDX_RESOURCE,DTE_DATE,DBL_VALUE')");

2019-07-15 23:08:50 AUDIT audit:72 - {"time":"July 15, 2019 11:08:50 PM IST","username":"root","opName":"LOAD DATA","opId":"11174739779154705","opStatus":"START"}

2019-07-15 23:08:51 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-15 23:08:51 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11174740740416991

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      6 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   6 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

  1. Query Execution with EXPLAIN command:-

 

scala> carbon.sql("EXPLAIN select groupNumber, resourceNumber,SUM(VsumM1) as VsumM1, SUM(VcountM1) as VcountM1, SUM(VsumM2) as VsumM2, SUM(VcountM2) as VcountM2, SUM(VsumM3) as VsumM3, SUM(VcountM3) as VcountM3, SUM(VsumM4) as VsumM4, SUM(VcountM4) as VcountM4, SUM(VsumM5) as VsumM5, SUM(VcountM5) as VcountM5,(SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2210 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M1NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2211 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M2NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2214 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M3NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2215 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M4NAME, (SELECT MIN(STR_NAME) FROM FRML_DESC_HIST FRML WHERE FRML.IDX_IND = 2234 AND FRML.DTE_DATE = (SELECT MAX(DTE_DATE) FROM FRML_DESC_HIST FRML1 WHERE FRML1.IDX_IND = FRML.IDX_IND AND FRML1.DTE_DATE < 1561507200 AND FRML1.STR_ACTION in ('I','U'))) as M5NAME FROM (select S1.IDX_GROUP as groupNumber ,M1.idx_resource as resourceNumber ,CASE when m1.idx_metric=2210 then SUM(M1.dbl_value) end as VsumM1,CASE when m1.idx_metric=2210 then count(m1.dbl_value) end as VcountM1,CASE when m1.idx_metric=2211 then SUM(M1.dbl_value) end as VsumM2,CASE when m1.idx_metric=2211 then count(m1.dbl_value) end as VcountM2,CASE when m1.idx_metric=2214 then SUM(M1.dbl_value) end as VsumM3,CASE when m1.idx_metric=2214 then count(m1.dbl_value) end as VcountM3,CASE when m1.idx_metric=2215 then SUM(M1.dbl_value) end as VsumM4,CASE when m1.idx_metric=2215 then count(m1.dbl_value) end as VcountM4,CASE when m1.idx_metric=2234 then SUM(M1.dbl_value) end as VsumM5,CASE when m1.idx_metric=2234 then count(m1.dbl_value) end as VcountM5 FROM MTRC00_NRAW_000_RECENT_1572019_2  M1, SE_GRP_MEMBER_HIST S1 where M1.idx_resource=S1.IDX_ALIAS_INST and S1.IDX_GROUP=200075906 and M1.idx_resource= 200071954 and M1.IDX_METRIC in (2210,2211,2214,2215,2234) group by M1.IDX_METRIC,M1.idx_resource,S1.IDX_GROUP) GROUP BY groupNumber,resourceNumber HAVING SUM(VcountM1) <> 0 OR SUM(VcountM2) <> 0 OR SUM(VcountM3) <> 0 OR SUM(VcountM4) <> 0 OR SUM(VcountM5) <> 0").show()

2019-07-16 12:33:25 AUDIT audit:72 - {"time":"July 16, 2019 12:33:25 PM IST","username":"root","opName":"EXPLAIN","opId":"11223014419086786","opStatus":"START"}

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 2

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:25 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014841650012

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     10 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  10 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014952642509

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014986644062

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014913695441

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223015020278657

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     11 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  11 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014881878925

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      9 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   9 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014903200116

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      1 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014976273697

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      4 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223015007180119

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      4 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   4 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014939414859

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      6 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   6 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:26 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014868807332

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   3 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:27 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223014828073061

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                      3 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                   3 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:33:27 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 1

2019-07-16 12:33:27 AUDIT audit:93 - {"time":"July 16, 2019 12:33:27 PM IST","username":"root","opName":"EXPLAIN","opId":"11223014419086786","opStatus":"SUCCESS","opTime":"2570 ms","table":"NA","extraInfo":{"query":"'Filter (((NOT ('SUM('VcountM1) = 0) || NOT ('SUM('VcountM2) = 0)) || NOT ('SUM('VcountM3) = 0)) || (NOT ('SUM('VcountM4) = 0) || NOT ('SUM('VcountM5) = 0)))"}}

+--------------------+

|                plan|

+--------------------+

|== CarbonData Pro...|

|== Physical Plan ...|

+--------------------+

  1. Even with simple select query we do not see any reference of datamp being getting hit while execution:-

 

scala> carbon.sql("EXPLAIN select IDX_METRIC,IDX_RESOURCE ,sum(dbl_value),avg(dbl_value),min(dbl_value),max(dbl_value),count(dbl_value) FROM MTRC00_NRAW_000_RECENT_1572019_2 group by IDX_METRIC,IDX_RESOURCE").show()

2019-07-16 12:37:47 AUDIT audit:72 - {"time":"July 16, 2019 12:37:47 PM IST","username":"root","opName":"EXPLAIN","opId":"11223276220632808","opStatus":"START"}

2019-07-16 12:37:47 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:37:47 STATISTIC DriverQueryStatisticsRecorderImpl:123 - Print query statistic for query id: 11223276310826076

+--------+--------------------+---------------------+------------------------+

|  Module|      Operation Step|     Total Query Cost|              Query Cost|

+--------+--------------------+---------------------+------------------------+

|  Driver|  Load blocks driver|                     |                     16 |

|        +--------------------+                     +------------------------+

|    Part|    Block allocation|                  16 |                      0 |

|        +--------------------+                     +------------------------+

|        |Block identification|                     |                      0 |

+--------+--------------------+---------------------+------------------------+

 

2019-07-16 12:37:47 STATISTIC QueryStatisticsRecorderImpl:72 - Time taken for Carbon Optimizer to optimize: 0

2019-07-16 12:37:47 AUDIT audit:93 - {"time":"July 16, 2019 12:37:47 PM IST","username":"root","opName":"EXPLAIN","opId":"11223276220632808","opStatus":"SUCCESS","opTime":"288 ms","table":"NA","extraInfo":{"query":"'Aggregate ['IDX_METRIC, 'IDX_RESOURCE], ['IDX_METRIC, 'IDX_RESOURCE, unresolvedalias('sum('dbl_value), None), unresolvedalias('avg('dbl_value), None), unresolvedalias('min('dbl_value), None), unresolvedalias('max('dbl_value), None), unresolvedalias('count('dbl_value), None)]"}}

+--------------------+

|                plan|

+--------------------+

|== CarbonData Pro...|

|== Physical Plan ...|

+--------------------+

The required carbon property  enable.query.statistics = true has been set .


BR,

Deepak 


On Mon, Jul 15, 2019 at 11:24 AM Ravindra Pesala <[hidden email]> wrote:
Hi,

Please provide the script/test case  you are using, I can try it.

Regards,
Ravindra

On Fri, 12 Jul 2019 at 16:55, Deepak Kulkarni <[hidden email]> wrote:
thx. We tried this example but we did not get the similar output for EXPLAIN PLAN command. Can yo help?
i

On Fri, Jul 12, 2019 at 4:08 PM Ravindra Pesala <[hidden email]> wrote:
Hi,

You want to use pre-aggregate datamaps, please try the example `org.apache.carbondata.examples.PreAggregateDataMapExample` to understand how it works. Please note that we are going to obsolete pre-aggregate datamaps from next version and make MV(materialized view ) datamaps going to replace it.

Regards,
Ravindra.

On Thu, 11 Jul 2019 at 17:40, Deepak Kulkarni <[hidden email]> wrote:
Thx. However we want to tame sure that the queries are using pre-aggregate datamaps . We tried using explain plan but the output showed in this link https://carbondata.apache.org/datamap-management.html we are not getting.

Can you help?

BR,
Deepak

On Tue, Jul 9, 2019 at 8:51 AM Ravindra Pesala <[hidden email]> wrote:
Hello,

1. Currently there is no way to impose primary key indexes on carbon. We may consider it in future. 

2. We have datamaps interface opened for implementing secondary indexes. Right now we have implemented for min/max , bloom indexes. User can add there own implementations as well.

3. There is no constraints supported on carbon right now.

4. Spark/hive has many predefined functions and also user can have there own user defined function and register to spark and use them carbon.

Regards,
Ravindra.

On Tue, 2 Jul 2019 at 2:52 PM, Deepak Kulkarni <[hidden email]> wrote:

Hi,

 

We are evaluating the Carbondata for following alternative to Oracle –and hit following issues when achieving this migration -

 

  1. Primary Key Indexes – how we can avoid duplicate records on Carbon?
  2. Secondary Indexes
  3. Constraints such as NULL, NOT NULL – how can we set this at column levels in carbon?
  4. Functions such as DECODE – what alternatives we have in Carbon to use these SQL functions?

 

Appreciate the response.

 


--
Regards,
Deepak
--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak


--
Thanks & Regards,
Ravi


--
Regards,
Deepak