FileNotFoundExceptions while running CarbonData

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

FileNotFoundExceptions while running CarbonData

Swapnil Shinde
Hello
    I am new to carbon data and we are trying to use carbon data in production. I built and installed it on Spark edge nodes as per given instruction -

Build - No major issues.
Installation - Followed yarn installation (http://carbondata.apache.org/installation-guide.html)  instructions.
Infrastructure - Spark 2.1.0 on MapR cluster.
carbon.properties changes -
          carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
          carbon.badRecords.location=/opt/Carbon/Spark/badrecords
          carbon.lock.type=HDFSLOCK
spark-default.conf changes -
         spark.yarn.dist.files     /opt/mapr/spark/spark-2.1.0/conf/carbon.properties
         spark.yarn.dist.archives    /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata.tar.gz
         spark.executor.extraJavaOptions    -Dcarbon.properties.filepath=carbon.properties
         spark.driver.extraJavaOptions     -Dcarbon.properties.filepath=/opt/mapr/spark/spark-2.1.0/conf/carbon.properties
Command line -
/opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master yarn --jars /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.0-shade-hadoop2.2.0.jar \
--driver-memory 1g \
--executor-cores 2 \
--executor-memory 2G
Code snippet -
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("/mapr/ri0.abc.com/tmp", "/mapr/ri0.abc.com/tmp")
carbon.sql("""CREATE TABLE
                        IF NOT EXISTS test_table(
                                  id string,
                                  name string,
                                  city string,
                                  age Int)
                       STORED BY 'carbondata'""")
  
carbon.sql("""LOAD DATA INPATH '/mapr/ri0.comscore.com/tmp/sample.csv'
                  INTO TABLE test_table""")

First error -
      Inital error was "Dictionay file is locked for updation". Further debugging showed that it was due to missing maprFS filesyste. (HDFSFileLock.java line # 52) -
String hdfsPath =  "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"

I added some code to workaround with path like maprfs:///* and that seemed to be working fine. (like adding MAPRFS FileType)

Second error -
   First error was gone after mapFS refactoring but then it fails with below error. It seems *.dict & *.dictmeta are not getting created. Could you please help me resolving this error?
Inline image 1


Thanks
Swapnil

Reply | Threaded
Open this post in threaded view
|

Re: FileNotFoundExceptions while running CarbonData

Liang Chen
Hi Swapnil

Very look forward to seeing your PR.
Please let me know your Apache JIRA email id, i will add the contributor right for you.

Regards
Liang

2017-07-18 6:49 GMT+08:00 Swapnil Shinde <[hidden email]>:
Thanks. I think I fixed it support maprFS. I will do some more testing and
then add a jira ticket and PR.

On Mon, Jul 17, 2017 at 11:51 AM, Ravindra Pesala <[hidden email]>
wrote:

> Hi,
>
> Right now we don't have support to maprfs filesystem , so it would be
> unpredictable even though you have fixed at some places. We need to check
> in all places and add the maprfs support. So it would be great if you can
> add the support to maprfs in carbon.
>
> And one more observation is please provide absolute path along with
> maprfs:// in all the places instead of giving relative path. And also make
> sure that storelocation inside carbon properties and store location while
> creating carbon session must be same.
>
> Regards,
> Ravindra.
>
> On 17 July 2017 at 11:25, Swapnil Shinde <[hidden email]> wrote:
>
> > Hello
> >     I am new to carbon data and we are trying to use carbon data in
> > production. I built and installed it on Spark edge nodes as per given
> > instruction -
> >
> > *Build -* No major issues.
> > *Installation -* Followed yarn installation (
> http://carbondata.apache.org/
> > installation-guide.html)  instructions.
> > *Infrastructure -* Spark 2.1.0 on MapR cluster.
> > *carbon.properties changes -*
> >           carbon.storelocation=/tmp/hacluster/Opt/CarbonStore
> >           carbon.badRecords.location=/opt/Carbon/Spark/badrecords
> >           carbon.lock.type=HDFSLOCK
> > *spark-default.conf changes -*
> >          spark.yarn.dist.files     /opt/mapr/spark/spark-2.1.0/
> > conf/carbon.properties
> >          spark.yarn.dist.archives    /opt/mapr/spark/spark-2.1.0/
> > carbonlib/carbondata.tar.gz
> >          spark.executor.extraJavaOptions
> -Dcarbon.properties.filepath=
> > carbon.properties
> >          spark.driver.extraJavaOptions     -Dcarbon.properties.filepath=/
> > opt/mapr/spark/spark-2.1.0/conf/carbon.properties
> > *Command line -*
> > /opt/mapr/spark/spark-2.1.0/bin/spark-shell --name "My app" --master
> yarn
> > --jars /opt/mapr/spark/spark-2.1.0/carbonlib/carbondata_2.11-1.1.
> 0-shade-hadoop2.2.0.jar
> > \
> > --driver-memory 1g \
> > --executor-cores 2 \
> > --executor-memory 2G
> > *Code snippet -*
> > import org.apache.spark.sql.SparkSession
> > import org.apache.spark.sql.CarbonSession._
> > val carbon = SparkSession.builder().config(sc.getConf).
> > getOrCreateCarbonSession("/mapr/ri0.abc.com/tmp", "/mapr/ri0.abc.com/tmp
> ")
> > carbon.sql("""CREATE TABLE
> >                         IF NOT EXISTS test_table(
> >                                   id string,
> >                                   name string,
> >                                   city string,
> >                                   age Int)
> >                        STORED BY 'carbondata'""")
> >
> > carbon.sql("""LOAD DATA INPATH '/mapr/ri0.comscore.com/tmp/sample.csv'
> >                   INTO TABLE test_table""")
> >
> > First error -
> >       Inital error was "*Dictionay file is locked for updation*". Further
> > debugging showed that it was due to missing maprFS filesyste.
> > (HDFSFileLock.java line # 52) -
> > String hdfsPath =  "conf.get(CarbonCommonConstants.FS_DEFAULT_FS)"
> >
> > I added some code to workaround with path like maprfs:///* and that
> seemed
> > to be working fine. (like adding MAPRFS FileType)
> >
> > *Second error -*
> >    First error was gone after mapFS refactoring but then it fails with
> > below error. *It seems *.dict & *.dictmeta are not getting created.*
> > Could you please help me resolving this error?
> > [image: Inline image 1]
> >
> >
> > Thanks
> > Swapnil
> >
> >
>
>
> --
> Thanks & Regards,
> Ravi
>