Uploaded image for project: 'WebDanica'
  1. WebDanica
  2. WEBDAN-86

The Optimaize language-detector won't run with pig

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • None
    • HADOOP
    • None
    • Sprint 5 - webdanica

    Description

      When upgrading the previous tika language detector to use the the Optimaize language-detector (https://github.com/optimaize/language-detector) it turns out the 0.5 version depends on the guava-18.jar library.

      However, all pig versions up to 0.16.0 is bundled with guava-11.jar library.
      which has precedence to guava-18.jar REGISTER'ed in the script or .pigbootup

      Thus we get the error:

      java.lang.NoSuchMethodError: com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;
           at com.optimaize.langdetect.i18n.LdLocale.fromString(LdLocale.java:77)
           at com.optimaize.langdetect.profiles.BuiltInLanguages.<clinit>(BuiltInLanguages.java:21)
           at com.optimaize.langdetect.profiles.LanguageProfileReader.readAllBuiltIn(LanguageProfileReader.java:118)
           at org.apache.tika.langdetect.OptimaizeLangDetector.loadModels(OptimaizeLangDetector.java:63)
           at dk.kb.webdanica.criteria.C4.computeNewC4(C4.java:56)
           at dk.kb.webdanica.criteria.CombinedCombo.exec(CombinedCombo.java:118)
           at dk.kb.webdanica.criteria.CombinedCombo.exec(CombinedCombo.java:85)
           at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:326)
           at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextString(POUserFunc.java:426)
           at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:341)
           at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:404)
           at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:321)
           at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:280)
           at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
           at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
           at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
           at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
           at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
      

      Attachments

        Activity

          People

            svc Søren Vejrup Carlsen (Inactive)
            svc Søren Vejrup Carlsen (Inactive)
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: