Package it.unimi.dsi.big.webgraph
Class BuildHostMap
java.lang.Object
it.unimi.dsi.big.webgraph.BuildHostMap
A class computing host-related data given a list of URLs (usually, the URLs of the nodes of a web graph).
All processing is performed by the static utility method
run(BufferedReader, PrintStream, DataOutputStream, DataOutputStream, boolean, ProgressLogger)
.
Warning: this class provides a main method that saves the host list to standard output, but it does some logging, too, so be careful not to log to standard output.
- Author:
- Sebastiano Vigna
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
static void
run
(BufferedReader br, PrintStream hosts, DataOutputStream mapDos, DataOutputStream countDos, boolean topPrivateDomain, ProgressLogger pl) This method reads URLs and writes hosts (or, possibly, top private domains), together with a map from URLs to hosts and a host count.
-
Field Details
-
DOTTED_ADDRESS
-
-
Constructor Details
-
BuildHostMap
public BuildHostMap()
-
-
Method Details
-
run
public static void run(BufferedReader br, PrintStream hosts, DataOutputStream mapDos, DataOutputStream countDos, boolean topPrivateDomain, ProgressLogger pl) throws IOException, URISyntaxException This method reads URLs and writes hosts (or, possibly, top private domains), together with a map from URLs to hosts and a host count.Warning: presently, this method uses an
Object2IntOpenHashMap
to store the map from host names to host indices. Thus, it cannot handle more than ≈700 million hosts.- Parameters:
br
- the buffered reader returning the list of URLs.hosts
- the print stream where hosts will be printed.mapDos
- the data output stream where the map from URLs to hosts will be written (one long per URL).countDos
- the data output stream where the host counts will be written (one long per host).topPrivateDomain
- if true, we useInternetDomainName.topPrivateDomain()
to map to top private domains, rather than hosts.pl
- a progress logger, ornull
.- Throws:
IOException
URISyntaxException
-
main
-