Class ImmutableGraph

java.lang.Object
it.unimi.dsi.webgraph.ImmutableGraph
All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>
Direct Known Subclasses:
ArcLabelledImmutableGraph, BVGraph, EFGraph, ImmutableSequentialGraph, ImmutableSubgraph, UnionImmutableGraph

public abstract class ImmutableGraph
extends Object
implements FlyweightPrototype<ImmutableGraph>
A simple abstract class representing an immutable graph.

Subclasses of this class are used to create and access immutable graphs, that is, graphs that are computed once for all, stored conveniently, and then accessed repeatedly. Moreover, immutable graphs are usually very large—so large that two such graphs may not fit into central memory (the main example being a sizable portion of the web).

A subclass of this class must implement methods to obtain the number of nodes, the outdegree of a node and the successors of a node (either successors(int) or successorArray(int)). Additionally, it may provide methods to obtain the number of arcs, and a basename.

This class provides equals(Object) and hashCode() methods that consider two graph equals if they have the same size and all their successor lists are equal.

Iterating on successors

Starting with WebGraph 2.0, the iterator architecture is fully lazy—you have no hasNext() method. Rather, the LazyIntIterator returned by successors(int) will return -1 when no more successors are available. The idiomatic forms for enumerating successors via iterators are

 LazyIntIterator successors = g.successors(x);
 int d = g.outdegree(x);
 while(d-- != 0) doSomething(successors.nextInt());
 
and
 LazyIntIterator successors = g.successors(x);
 int t;
 while((t = successors.nextInt()) != -1) doSomething(t);
 

The alternative method successorArray(int) provides an array containing the successors and possibly more elements. Use outdegree(int) to know how many elements are valid. The efficiency of successors(int) and successorArray(int) may vary depending on the implementation.

Iterating on a graph in parallel

You can scan a graph sequentially using node iterators. Starting with version 3.5.0, implementations of this class may return true on hasCopiableIterators(), which means that node iterators implement the optional copy(int) method. Using copy(int), the method splitNodeIterators(int) of this class is able to provide separate, thread-safe iterators on different segments of contiguous nodes of the graph. The class BVGraph, for example, uses this interface to provide parallel compression. We suggest that all classes providing parallel iteration read the system variable "it.unimi.dsi.webgraph.threads" to override the number of parallel threads.

Building an immutable graph

Due to their large size, immutable graphs have a peculiar serialisation scheme. Every subclass of this class must implement a number of static methods that create an immutable graph, given a string (usually a basename for a set of files) and, optionally, a ProgressLogger. The signatures that must be implemented are

  • ImmutableGraph load(CharSequence, ProgressLogger);
  • ImmutableGraph load(CharSequence);
  • ImmutableGraph loadOffline(CharSequence, ProgressLogger);
  • ImmutableGraph loadOffline(CharSequence).
  • ImmutableGraph loadOnce(InputStream);

Additionally, the following signatures can be implemented:

  • ImmutableGraph loadMapped(CharSequence, ProgressLogger);
  • ImmutableGraph loadMapped(CharSequence);

The special semantics associated to loadOffline() is that the immutable graph should be set up, and possibly some metadata could be read from disk, but no actual data is loaded into memory; the class should guarantee that offline sequential access (i.e., by means of nodeIterator(int)) is still possible. In other words, in most cases nodeIterator(int) will have to be overridden by the subclasses to behave properly even in an offline setting (see nodeIterator()). The special semantics associated with loadOnce() is that the graph can be traversed just once using a call to nodeIterator(). The special semantics associated with loadMapped() is that metadata could be read from disk, but the graph will be accessed by memory mapping; the class should guarantee that random access is possible.

Note that a simple class may just implement all special forms of graph loading delegating to the standard load method (see, e.g., ASCIIGraph). Specific implementations of ImmutableGraph may also decide to expose internal load methods to make it easier to write load methods for subclasses (see, e.g., loadInternal()).

Analogously, a subclass of this class may also implement

  • store(ImmutableGraph, CharSequence, ProgressLogger);
  • store(ImmutableGraph, CharSequence).
These methods must store in compressed form a given immutable graph, using the default values for compression parameters, etc. It is likely, however, that more of store methods are available, as parameters vary wildly from subclass to subclass. The method store(Class, ImmutableGraph, CharSequence, ProgressLogger) invokes by reflection the methods above on the provided class.

The standard method to build a new immutable graph is creating a (possibly anonymous) class that extends this class, and save it using a concrete subclass (e.g., BVGraph). See the source of Transform for several examples.

Properties Conventions

To provide a simple way to load an immutable graph without knowing in advance its class, the following convention may be followed: a graph with basename name may feature a Java property file name.properties with a property graphclass containing the actual class of the graph. In this case, you can use the implementation of the load/store methods contained in this class, similarly to the standard Java serialisation scheme. BVGraph, for instance, follows this convention, but ASCIIGraph does not.

The reason why this convention is not enforced is that it is sometimes useful to write lightweight classes, mostly for debugging purposes, whose graph representation is entirely contained in a single file (e.g., ASCIIGraph), so that loadOnce(InputStream) can be easily implemented.

Facilities for loading an immutable graph

ImmutableGraph provides ready-made implementations of the load methods that work as follows: they opens a property file with the given basename, and look for the graphclass property; then, they simply delegates the actual load to the specified graph class by reflection.

Thread-safety and flyweight copies

Implementations of this class need not be thread-safe. However, they implement the FlyweightPrototype pattern: the copy() method is thread-safe and will return a lightweight copy of the graph—usually, all immutable data will be shared between copies. Concurrent access to different copies is safe.

Note that by contract copy() is guaranteed to work only if randomAccess() returns true.

  • Field Details

    • GRAPHCLASS_PROPERTY_KEY

      public static final String GRAPHCLASS_PROPERTY_KEY
      See Also:
      Constant Field Values
    • PROPERTIES_EXTENSION

      public static final String PROPERTIES_EXTENSION
      The standard extension of property files.
      See Also:
      Constant Field Values
    • NUMBER_OF_THREADS_PROPERTY

      public static final String NUMBER_OF_THREADS_PROPERTY
      The property used to set the number of parallel compression threads.
      See Also:
      Constant Field Values
  • Constructor Details

    • ImmutableGraph

      public ImmutableGraph()
  • Method Details

    • numNodes

      public abstract int numNodes()
      Returns the number of nodes of this graph.

      Albeit this method is not optional, it is allowed that this method throws an UnsupportedOperationException if this graph has never been entirely traversed using a node iterator. This apparently bizarre behaviour is necessary to support implementations as ArcListASCIIGraph, which do not know the actual number of nodes until a traversal has been completed.

      Returns:
      the number of nodes.
    • numArcs

      public long numArcs()
      Returns the number of arcs of this graph (optional operation).
      Returns:
      the number of arcs.
    • randomAccess

      public abstract boolean randomAccess()
      Checks whether this graph provides random access to successor lists.
      Returns:
      true if this graph provides random access to successor lists.
    • hasCopiableIterators

      public boolean hasCopiableIterators()
      Whether the node iterators returned by this graph support NodeIterator.copy(int).
      Returns:
      true if this graph provides copiable iterators.
      Implementation Specification:
      This implementation just returns randomAccess().
    • basename

      public CharSequence basename()
      Returns a symbolic basename for this graph (optional operation).
      Returns:
      the basename.
      Implementation Notes:
      Implementors of this class may provide a basename (usually a pathname from which various files storing the graph are stemmed). This method is optional because it is sometimes unmeaningful (e.g., for one-off anonymous classes).
    • successors

      public LazyIntIterator successors​(int x)
      Returns a lazy iterator over the successors of a given node. The iteration terminates when -1 is returned.
      Parameters:
      x - a node.
      Returns:
      a lazy iterator over the successors of the node.
      API Notes:
      The semantics of this method has been significantly modified in WebGraph 2.0 to take advantage of the new, faster lazy architecture.
      Implementation Specification:
      This implementation just wraps the array returned by successorArray(int). Subclasses are encouraged to override this implementation.
    • successorArray

      public int[] successorArray​(int x)
      Returns a reference to an array containing the successors of a given node.

      The returned array may contain more entries than the outdegree of x. However, only those with indices from 0 (inclusive) to the outdegree of x (exclusive) contain valid data.

      Parameters:
      x - a node.
      Returns:
      an array whose first elements are the successors of the node; the array must not be modified by the caller.
      Implementation Specification:
      This implementation just unwraps the iterator returned by successors(int). Subclasses are encouraged to override this implementation.
      Implementation Notes:
      All implementations must guarantee that a distinct array is returned for each node. The caller, in turn, must treat the array as a read-only object.
    • outdegree

      public abstract int outdegree​(int x)
      Returns the outdegree of a node.
      Parameters:
      x - a node.
      Returns:
      the outdegree of the given node.
      Throws:
      IllegalStateException - if called without offsets.
    • nodeIterator

      public NodeIterator nodeIterator​(int from)
      Returns a node iterator for scanning the graph sequentially, starting from the given node.
      Parameters:
      from - the node from which the iterator will iterate.
      Returns:
      a NodeIterator for accessing nodes and successors sequentially.
      Implementation Specification:
      This implementation just calls the random-access methods (successors(int) and outdegree(int)). More specific implementations may choose to maintain some extra state to make the enumeration more efficient.
    • nodeIterator

      public NodeIterator nodeIterator()
      Returns a node iterator for scanning the graph sequentially, starting from the first node.
      Returns:
      a NodeIterator for accessing nodes and successors sequentially.
    • splitNodeIterators

      public NodeIterator[] splitNodeIterators​(int howMany)
      Returns an array of node iterators, scanning each a portion of the nodes of a graph. Iterators are guaranteed to scan mutually disjoint sets of nodes, and every node is guaranteed to be scanned by one iterator.

      This is an optional operation. If implemented, though, the returned iterators must properly implement NodeIterator.copy(int).

      Parameters:
      howMany - the number of iterators to be returned (at the end of the array, some of them may be empty).
      Returns:
      the required iterators.
    • copy

      public abstract ImmutableGraph copy()
      Returns a flyweight copy of this immutable graph.
      Specified by:
      copy in interface FlyweightPrototype<ImmutableGraph>
      Returns:
      a flyweight copy of this immutable graph.
      Throws:
      UnsupportedOperationException - if flyweight copies are not supported: support is guaranteed only if randomAccess() returns true.
      See Also:
      FlyweightPrototype
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • outdegrees

      public IntIterator outdegrees()
      Returns an iterator enumerating the outdegrees of the nodes of this graph.
      Returns:
      an iterator enumerating the outdegrees of the nodes of this graph.
    • loadSequential

      @Deprecated public static ImmutableGraph loadSequential​(CharSequence basename) throws IOException
      Creates a new ImmutableGraph by loading a graph file from disk to memory, without offsets.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
    • loadSequential

      @Deprecated public static ImmutableGraph loadSequential​(CharSequence basename, ProgressLogger pl) throws IOException
      Creates a new ImmutableGraph by loading a graph file from disk to memory, without offsets.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      pl - a progress logger used while loading the graph, or null.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
    • loadOffline

      public static ImmutableGraph loadOffline​(CharSequence basename) throws IOException
      Creates a new ImmutableGraph by loading offline a graph file.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
    • loadOffline

      public static ImmutableGraph loadOffline​(CharSequence basename, ProgressLogger pl) throws IOException
      Creates a new ImmutableGraph by loading offline a graph file.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      pl - a progress logger used while loading the graph, or null.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
    • loadMapped

      public static ImmutableGraph loadMapped​(CharSequence basename, ProgressLogger pl) throws IOException
      Creates a new ImmutableGraph by memory-mapping a graph file.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      pl - a progress logger used while loading the offsets, or null.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while memory-mapping the graph or reading the offsets.
    • loadMapped

      public static ImmutableGraph loadMapped​(CharSequence basename) throws IOException
      Creates a new ImmutableGraph by memory-mapping a graph file.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while memory-mapping the graph or reading the offsets.
    • loadOnce

      public static ImmutableGraph loadOnce​(InputStream is) throws IOException
      Creates a new ImmutableGraph by loading a read-once graph from an input stream.
      Parameters:
      is - an input stream containing the graph.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
      UnsupportedOperationException - if this graph class does not support read-once graphs.
      Implementation Specification:
      This implementation just throws a UnsupportedOperationException. There is no way to write a generic implementation, because there is no way to know in advance the class that should read the graph.
    • load

      public static ImmutableGraph load​(CharSequence basename) throws IOException
      Creates a new ImmutableGraph by loading a graph file from disk to memory, with all offsets, using no progress logger.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
    • load

      public static ImmutableGraph load​(CharSequence basename, ProgressLogger pl) throws IOException
      Creates a new ImmutableGraph by loading a graph file from disk to memory, with all offsets, using a progress logger.

      This method uses the properties convention described in the introduction.

      Parameters:
      basename - the basename of the graph.
      pl - a progress logger used while loading the graph, or null.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
    • load

      protected static ImmutableGraph load​(ImmutableGraph.LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl) throws IOException
      Creates a new immutable graph by loading a graph file from disk to memory, delegating the actual loading to the class specified in the graphclass property within the property file (named basename.properties). The exact load method to be used depends on the method argument.

      This method uses the properties convention described in the introduction.

      Parameters:
      method - the method to be used to load the graph.
      basename - the basename of the graph, if method is not ImmutableGraph.LoadMethod.ONCE.
      is - an input stream the containing the graph, if method is ImmutableGraph.LoadMethod.ONCE.
      pl - the progress logger; it can be null.
      Returns:
      an ImmutableGraph containing the specified graph.
      Throws:
      IOException - if an I/O exception occurs while reading the graph.
    • store

      public static void store​(Class<?> graphClass, ImmutableGraph graph, CharSequence basename, ProgressLogger pl) throws IOException
      Stores an immutable graph using a specified subclass and a progress logger.

      This method is a useful shorthand that invoke by reflection the store method of a given subclass. Note, however, that usually a subclass will provide more refined store methods with more parameters.

      Parameters:
      graphClass - the subclass of ImmutableGraph that should store the graph.
      graph - the graph to store.
      basename - the basename.
      pl - a progress logger, or null.
      Throws:
      IOException
    • store

      public static void store​(Class<?> graphClass, ImmutableGraph graph, CharSequence basename) throws IOException
      Stores an immutable graph using a specified subclass.
      Parameters:
      graphClass - the subclass of ImmutableGraph that should store the graph.
      graph - the graph to store.
      basename - the basename.
      Throws:
      IOException
      See Also:
      store(Class, ImmutableGraph, CharSequence, ProgressLogger)
    • equals

      public boolean equals​(Object o)
      Compare this immutable graph to another object.
      Overrides:
      equals in class Object
      Returns:
      true iff the given object is an immutable graph of the same size, and the successor list of every node of this graph is equal to the successor list of the corresponding node of o.
    • hashCode

      public int hashCode()
      Returns a hash code for this immutable graph.
      Overrides:
      hashCode in class Object
      Returns:
      a hash code for this immutable graph.