WebGraph is a framework for graph compression aimed at studying web graphs. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

  1. A set of simple codes, called ζ codes, which are particularly suitable for storing web graphs (or, in general, integers with a power-law distribution in a certain exponent range).
  2. Algorithms for compressing web graphs that exploit gap compression and differential compression (à la LINK), intervalisation and ζ codes to provide a high compression ratio (see our datasets). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
  3. Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
  4. Algorithms for analysing very large graphs, such as HyperBall, which has been used to show that Facebook has just four degrees of separation.
  5. This package, providing a complete, documented implementation of the algorithms above in Java. It is free software distributed under either the GNU Lesser General Public License 2.1+ or the Apache Software License 2.0.
  6. Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or gathered by UbiCrawler.

In the end, with WebGraph you can access and analyse very large web graphs. Using WebGraph is as easy as installing a few jar files and downloading a data set.

You are welcome to use and improve WebGraph! If you find our software useful for your research, please quote our paper “The WebGraph Framework I: Compression Techniques”, by Paolo Boldi and Sebastiano Vigna, in Proc. of the Thirteenth World–Wide Web Conference, pages 595−601, 2004, ACM Press.

Looking around

For in-depth information on the Webgraph framework, you should have a look at its home page, where you can find some papers about the compression techniques it uses. Datasets are available at the LAW web site.

The classes of interest for the casual Webgraph user are ImmutableGraph, which specifies the access methods for an immutable graph, BVGraph, which makes it possible to retrieve or recompress a graph stored in the format described in The WebGraph Framework I: Compression Techniques, EFGraph, which provides a quasi-succinct representation using the Elias–Fano representation of monotone sequences, and Transform, which provides several ways to transform an ImmutableGraph.

If you plan on building your graphs dynamically, the class ArrayListMutableGraph makes it possible to create incrementally a graph and then extract an immutable view.

The package it.unimi.dsi.webgraph.examples contains useful examples that show how to access sequentially and randomly an immutable graph.

Exporting to other formats

ASCIIGraph and ArcListASCIIGraph have main methods that can be used to save an immutable graph, as long as you can load it, in ASCII form. With data in BVGraph or EFGraph format this is as simple as

java -server it.unimi.dsi.webgraph.ASCIIGraph sourcebasename dest
or
java -server it.unimi.dsi.webgraph.ArcListASCIIGraph sourcebasename dest

Please consult the documentation and the command-line help of these two classes to get more information.

Importing your data

If you want to import your own data into WebGraph, you must write an implementation of ImmutableGraph that exposes your data. A simple example is given in IntegerListImmutableGraph, a stub class exposing a simple, noncompressed binary format as an ImmutableGraph. Once your data is exposed in that way, you can get a compressed version using the store() method of your class of interest. Often, there is a main method (see, e.g., BVGraph) that will load your class and invoke store() for you.

For example, you can use an immutable graph inside the Jung framework using our JungAdapter.

As an alternative, the class ASCIIGraph can be used to read graphs specified in a very simple ASCII format. The class implements ASCIIGraph.loadOnce(java.io.InputStream) so that the file can be just piped into a class offering a main method that supports loadOnce() (e.g., BVGraph). You can also generate a graph in ASCII format and read it using ASCIIGraph.loadOffline(CharSequence)—the graph will not be loaded into main memory.

ASCIIGraph requires listing the successors of each node on a separate line. If your graph is specified arc by arc (one arc per line) you can use ArcListASCIIGraph instead. ShiftedByOneArcListASCIIGraph can be used if your input data numbers (rather insensibly) nodes starting from one.

Another possibility is to specify your graph incrementally. which just involves enumerating arrays of successors for each node.

Importing your labelled data

Arc-labelled graphs are represented using implementations of ArcLabelledImmutableGraph. Most arc-labelled graphs are based on an underlying ImmutableGraph, and the ArcLabelledImmutableGraph implementation just provides label handling. The example IntegerTriplesArcLabelledImmutableGraph shows how to expose your data as an instance of ArcLabelledImmutableGraph, so you can save your data using your preferred combination of implementations.

Packages
Package Description
it.unimi.dsi.webgraph
Main classes implementing the WebGraph algorithms.
it.unimi.dsi.webgraph.algo
Classes implementing useful algorithms on graphs.
it.unimi.dsi.webgraph.examples
Example classes that do nice things using the WebGraph framework.
it.unimi.dsi.webgraph.jung  
it.unimi.dsi.webgraph.labelling
Main classes implementing labelling for immutable graphs.
it.unimi.dsi.webgraph.test