Details
-
Improvement
-
Resolution: Fixed
-
Critical
-
None
-
None
-
None
-
Sprint 2, Sprint 3 - webdanica, Sprint 4 - webdanica, Sprint 5 - webdanica
Description
Language identification is not to be confused by multiple languages in the textextract.
This was seen using tika
org.apache.tika.language.LanguageIdentifier.LanguageIdentifier(String content).getLanguage()
Where tika was lured in saying that the language was Dutch