Uploaded image for project: 'WebDanica'
  1. WebDanica
  2. WEBDAN-272

Content on web pages are automatically changed to country language

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • None
    • webdanicasprint - vinter 2017

    Description

      Content on web pages are automatically changed to country language

      This issue happens with e.g. pinterest's web site and every region is in danish.

      https://www.pinterest.at/
      https://www.pinterest.ca/
      https://www.pinterest.ch/
      https://www.pinterest.cl/
      https://www.pinterest.co.kr/
      https://www.pinterest.co.uk/
      https://www.pinterest.com.mx/
      https://www.pinterest.com/
      https://www.pinterest.de/
      https://www.pinterest.es/
      https://www.pinterest.fr/
      https://www.pinterest.ie/
      https://www.pinterest.nz/
      https://www.pinterest.pt/
      https://www.pinterest.se/

      When these are all harvest, webdanica will analyse them as danish because the content on all region pages are in danish. The page isn't redirected, so the content has changed before you reached the site.

      I can see from a Firefox browser, that the request header has the following:

      Accept-Language:"da,en-US;q=0.7,en;q=0.3" 
      

      This could be a problem, but we only send this with our Heritrix harvester.

      <!-- Accept headers for HTTP fetching -->
              <property name="acceptHeaders">
                  <list>
                      <value>Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8</value>
                  </list>
              </property>
      

      It could even be the ip-address the site looks at, but this issue gives false positives webdanica danice seeds.

      Attachments

        Activity

          People

            svc Søren Vejrup Carlsen (Inactive)
            sthu Stephen Hunt
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: