public class BuildHostMap extends java.lang.Object
run(BufferedReader, PrintStream, DataOutputStream, DataOutputStream, boolean, ProgressLogger).
Warning: this class provides a main method that saves the host list to standard output, but it does some logging, too, so be careful not to log to standard output.
|Modifier and Type||Field and Description|
|Constructor and Description|
|Modifier and Type||Method and Description|
This method reads URLs and writes hosts (or, possibly, top private domains), together with a map from URLs to hosts and a host count.
public static void run(java.io.BufferedReader br, java.io.PrintStream hosts, java.io.DataOutputStream mapDos, java.io.DataOutputStream countDos, boolean topPrivateDomain, ProgressLogger pl) throws java.io.IOException, java.net.URISyntaxException
Warning: presently, this method uses an
Object2IntOpenHashMap to store the
map from host names to host indices. Thus, it cannot handle more than ≈700 million hosts.
br- the buffered reader returning the list of URLs.
hosts- the print stream where hosts will be printed.
mapDos- the data output stream where the map from URLs to hosts will be written (one long per URL).
countDos- the data output stream where the host counts will be written (one long per host).
topPrivateDomain- if true, we use
InternetDomainName.topPrivateDomain()to map to top private domains, rather than hosts.
pl- a progress logger, or