SBPROJECTS will be offline Wednesday between 7:30 and 8:30
SBForge with all its applications will be down for security updates during a time interval of about 10-20 minutes in the interval mentioned above.
Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference August the 21th 2012, 13:00-14:00.
- Mikis will establish the skype-conference at 13:00 (Please do not connect yourself):
- TDC tele-conference: (If it fails to establish a skype tele-conference):
- Dial in number (+45) 70 26 50 45
- Dial in code 9064479#
- BridgeIT: BridgeIT conference will be available about 5 min. before start of meeting. The Bridgit url is konf01.statsbiblioteket.dk. The Bridgit password is sbview.
- BNF: Nicolas, Sara (Wasnt able to participate).
- ONB: Michaela and Andreas
- KB: Tue, Søren and Nicholas
- SB: Colin and Mikis, Sabine (Sabine and Colin weren't present).
- Any other issues to be discussed on today's tele-conference?
Heritrix 3 in NetarchiveSuite
- The week of 17.sep. Søren will go to BJ on the 20-21 september.
- Issue for planning: NAS-2066 Heritrix roadmap Workshop.
JhoNAS status (Nicholas)
A status update from the begining of August was sent to the PWG and is accessible from this link: jhonas-project-status-aug.pdf
- Shared testing of WARC functionality?
Moved sourcecode to GitHub?
I think we should consider moving the code to git hub because:
- Git is a much more flexible than Subversion, see 3 Reasons to Switch to Git from Subversion, GitSvnComparison, svn - git vs Subversion - pros and cons, Why You Should Switch from Subversion to Git.
- Will be moving the code to a standard open source hosting sites, which will increase accessability.
- GitHub is great!
Iteration 52 (3.21 development release) (Mikis)
Status of the production sites
- Netarkivet: TLR
Second broad crawl 2012 (NR 15) was finished primo july.
Third broad crwawl 2012 (NR 16) was started August the 14th using 3.18.3. 1. step is allmost finished.
Version 3.20.* is currently accepttested and we are preparing for production medio october. We have found 2 issues, which Søren is looking into.
Our Wayback is now indexed up to July 2012 and I'm preparing/testing automatic indexing in production.
Thanks to Jon and his son we have downloaded thousands of youtube videos the last month.
We have during the summer 2 productions issues without big impact on the system:
1) SB SAN pillar was down one day without affecting any harvesting because the KB site was running and all harvesters on SB was inpendent servers with own disk storage.
2) We lost 1 day of harvesting caused by no process resources on our admin server. We are still investigating the logs for futher explanations.
3 questions for BNF:
1) Can you show "Show comments" for harvested facebook.com sites?
2) If you harvest youtube and download videos, how do you link the youtube "metadata" page with the actual video URL?
3) Which progresql version are you using in production - 8.4?
- Netarkivet: SAS (for a month ago)
As our broad crawls a speeded up to last less than 2 month, we took advantage of the break between to broad crawls
- To crawl “very big web sites” (such as the Danish National Broadcast dr.dk and our other main tv-station tv2.dk) in depth.
- To crawl websites of ministries, departments etc. in depth
- To capture url’s of YouTube videos on and by political parties
We started our own event crawl on the Olympics in London: entering url’s into the system, QA and monitoring.
As to our selective crawls: “business as usual” – that is to say: analyze of “candidates” (new sites proposed for selective crawls), QA of selective crawls, monitoring harvest jobs, revision of harvest profiles
Date for NAS workshop at SB