Class HadoopJobUtils
- java.lang.Object
-
- dk.netarkivet.common.utils.hadoop.HadoopJobUtils
-
public class HadoopJobUtils extends java.lang.Object
Utilities for Hadoop jobs.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
DEFAULT_FILESYSTEM
static java.lang.String
MAPREDUCE_FRAMEWORK
static java.lang.String
YARN_RESOURCEMANAGER_ADDRESS
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.util.List<java.lang.String>
collectOutputLines(org.apache.hadoop.fs.FileSystem fileSystem, org.apache.hadoop.fs.Path outputFolder)
Collects lines from a jobs output files at a specified path.static void
doKerberosLogin()
Login to Kerberos from the settings specified in CommonSettings.static java.util.List<CDXRecord>
getCDXRecordListFromCDXLines(java.util.List<java.lang.String> cdxLines)
Converts a list of CDX line strings to a list of CDXRecordsstatic org.apache.hadoop.conf.Configuration
getConf()
Initialize a hadoop configuration.static org.apache.hadoop.security.UserGroupInformation
getUserGroupInformation()
Obtain a logged in UserGroupInformation for running hadoop jobs from the kerberos parameters defined in CommonSettings.static void
writeHadoopInputFileLinesToInputFile(java.util.List<java.nio.file.Path> files, java.nio.file.Path inputFilePath)
Given a list of file paths prepend 'file://' to every entry and write them as newline separated lines to the given input file path.
-
-
-
Field Detail
-
DEFAULT_FILESYSTEM
public static final java.lang.String DEFAULT_FILESYSTEM
- See Also:
- Constant Field Values
-
MAPREDUCE_FRAMEWORK
public static final java.lang.String MAPREDUCE_FRAMEWORK
- See Also:
- Constant Field Values
-
YARN_RESOURCEMANAGER_ADDRESS
public static final java.lang.String YARN_RESOURCEMANAGER_ADDRESS
- See Also:
- Constant Field Values
-
-
Method Detail
-
getUserGroupInformation
public static org.apache.hadoop.security.UserGroupInformation getUserGroupInformation() throws sun.security.krb5.KrbException, java.io.IOException
Obtain a logged in UserGroupInformation for running hadoop jobs from the kerberos parameters defined in CommonSettings.- Returns:
- The UserGroupInformation instance
- Throws:
sun.security.krb5.KrbException
- if the kerberos configuration is invalidjava.io.IOException
- if the kerberos login fails
-
doKerberosLogin
public static void doKerberosLogin() throws sun.security.krb5.KrbException, java.io.IOException
Login to Kerberos from the settings specified in CommonSettings.- Throws:
sun.security.krb5.KrbException
- if the kerberos configuration is invalidjava.io.IOException
- if the kerberos login fails
-
getConf
public static org.apache.hadoop.conf.Configuration getConf()
Initialize a hadoop configuration. The basic configuration must be in a directory on the classpath. This class additionally sets the path to the uber jar specified in CommonSettings#HADOOP_MAPRED_UBER_JAR- Returns:
- A new configuration to use for a job.
-
writeHadoopInputFileLinesToInputFile
public static void writeHadoopInputFileLinesToInputFile(java.util.List<java.nio.file.Path> files, java.nio.file.Path inputFilePath) throws java.io.IOException
Given a list of file paths prepend 'file://' to every entry and write them as newline separated lines to the given input file path.- Parameters:
files
- A list of input file paths to operate oninputFilePath
- The path of the file to write the lines to- Throws:
java.io.IOException
- If the input file path cannot be written to
-
collectOutputLines
public static java.util.List<java.lang.String> collectOutputLines(org.apache.hadoop.fs.FileSystem fileSystem, org.apache.hadoop.fs.Path outputFolder) throws java.io.IOException
Collects lines from a jobs output files at a specified path. Also deletes the folder once the output has been collected.- Parameters:
fileSystem
- The filesystem that the result is collected from.outputFolder
- The output folder to find the job result files in.- Returns:
- A list of lines collected from all the output files.
- Throws:
java.io.IOException
- If the output folder or its contents cannot be read.
-
getCDXRecordListFromCDXLines
public static java.util.List<CDXRecord> getCDXRecordListFromCDXLines(java.util.List<java.lang.String> cdxLines)
Converts a list of CDX line strings to a list of CDXRecords- Parameters:
cdxLines
- The list to convert- Returns:
- A list of CDXRecords representing the old list
-
-