Class ImmutableGraph
- All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>
- Direct Known Subclasses:
ArcLabelledImmutableGraph
,BidirectionalImmutableGraph
,BVGraph
,EFGraph
,ImmutableSequentialGraph
,ImmutableSubgraph
,UnionImmutableGraph
Subclasses of this class are used to create and access immutable graphs, that is, graphs that are computed once for all, stored conveniently, and then accessed repeatedly. Moreover, immutable graphs are usually very large—so large that two such graphs may not fit into central memory (the main example being a sizable portion of the web).
A subclass of this class must implement methods to obtain the number of
nodes, the outdegree of a node and the successors of a node
(either successors(long)
or successorBigArray(long)
). Additionally, it may
provide methods to obtain the number of arcs, and a
basename.
This class provides equals(Object)
and hashCode()
methods that consider two
graph equals if they have the same size and all their successor lists are equal.
Iterating on successors
Starting with WebGraph 2.0, the iterator architecture is fully lazy—you have no
hasNext()
method. Rather, the LazyLongIterator
returned by
successors(long)
will return -1 when no more successors are available. The idiomatic
forms for enumerating successors via iterators are
LazyLongIterator successors = g.successors(x); int d = g.outdegree(x); while (d-- != 0) doSomething(successors.nextInt());and
LazyLongIterator successors = g.successors(x); int t; while ((t = successors.nextInt()) != -1) doSomething(t);
The alternative method successorBigArray(long)
provides an array containing the
successors and possibly more elements. Use outdegree(long)
to know how many
elements are valid. The efficiency of successors(long)
and
successorBigArray(long)
may vary depending on the implementation.
Iterating on a graph in parallel
You can scan a graph sequentially using node iterators. Starting with
version 3.6.0, implementations of this class may return true on hasCopiableIterators()
,
which means that node iterators implement the optional copy(long)
method. Using copy(long)
, the method
splitNodeIterators(int)
of this class is able to provide separate, thread-safe iterators
on different segments of contiguous nodes of the graph. The class BVGraph
, for example,
uses this interface to provide parallel compression. We suggest that all classes providing
parallel iteration read the system variable "it.unimi.dsi.webgraph.threads" to override the
number of parallel threads.
Building an immutable graph
Due to their large size, immutable graphs have a peculiar serialisation scheme. Every subclass of
this class must implement a number of static methods that create an immutable
graph, given a string (usually a basename for a set of files) and, optionally, a
ProgressLogger
. The signatures that must be
implemented are
ImmutableGraph load(CharSequence, ProgressLogger)
;ImmutableGraph load(CharSequence)
;ImmutableGraph loadOffline(CharSequence, ProgressLogger)
;ImmutableGraph loadOffline(CharSequence)
.ImmutableGraph loadOnce(InputStream)
;
Additionally, the following signatures can be implemented:
ImmutableGraph loadMapped(CharSequence, ProgressLogger)
;ImmutableGraph loadMapped(CharSequence)
;
The special semantics associated to loadOffline()
is that the immutable graph should
be set up, and possibly some metadata could be read from disk, but no actual data is loaded into
memory; the class should guarantee that offline sequential access (i.e., by means of
nodeIterator(long)
) is still possible. In other words, in most cases
nodeIterator(long)
will have to be overridden by the subclasses to behave properly even
in an offline setting (see nodeIterator()
). The special semantics associated with
loadOnce()
is that the graph can be traversed just once using a call to
nodeIterator()
. The special semantics associated with loadMapped()
is that
metadata could be read from disk, but the graph will be accessed by memory mapping; the class
should guarantee that random access is possible.
Note that a simple class may just implement all special forms of graph loading delegating to the
standard load method (see, e.g., ASCIIGraph
). Specific
implementations of ImmutableGraph
may also decide to expose internal load methods to make
it easier to write load methods for subclasses (see, e.g.,
loadInternal()
).
Analogously, a subclass of this class may also implement
store(ImmutableGraph, CharSequence, ProgressLogger)
;store(ImmutableGraph, CharSequence)
.
store
methods are
available, as parameters vary wildly from subclass to subclass. The method
store(Class, ImmutableGraph, CharSequence, ProgressLogger)
invokes by reflection the
methods above on the provided class.
The standard method to build a new immutable graph is creating a (possibly anonymous) class that
extends this class, and save it using a concrete subclass (e.g.,
BVGraph
). See the source of
Transform
for several examples.
Properties Conventions
To provide a simple way to load an immutable graph without knowing in advance its class, the
following convention may be followed: a graph with basename name
may
feature a Java property file name.properties
with a property
graphclass
containing the actual class of the graph. In this case, you can use the
implementation of the load/store methods contained in this class, similarly to the standard Java
serialisation scheme. BVGraph
, for instance, follows this convention, but
ASCIIGraph
does not.
The reason why this convention is not enforced is that it is sometimes useful to write
lightweight classes, mostly for debugging purposes, whose graph representation is entirely
contained in a single file (e.g., ASCIIGraph
), so that loadOnce(InputStream)
can
be easily implemented.
Facilities for loading an immutable graph
ImmutableGraph
provides ready-made implementations of the load methods that work as
follows: they opens a property file with the given basename, and look for the
graphclass
property; then, they simply delegates the actual load to the specified
graph class by reflection.
Thread-safety and flyweight copies
Implementations of this class need not be thread-safe. However, they implement the
FlyweightPrototype
pattern: the copy()
method is thread-safe and will return a
lightweight copy of the graph—usually, all immutable data will be shared between copies.
Concurrent access to different copies is safe.
Note that by contract copy()
is guaranteed to work only if randomAccess()
returns true.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
A list of the methods that can be used to load a graph. -
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionbasename()
Returns a symbolic basename for this graph (optional operation).abstract ImmutableGraph
copy()
Returns a flyweight copy of this immutable graph.boolean
Compare this immutable graph to another object.boolean
Whether the node iterators returned by this graph supportNodeIterator.copy(long)
.int
hashCode()
Returns a hash code for this immutable graph.int
A method returning the number of nodes as an integer, for easier backward compatibility.protected static ImmutableGraph
load
(ImmutableGraph.LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl) Creates a new immutable graph by loading a graph file from disk to memory, delegating the actual loading to the class specified in thegraphclass
property within the property file (namedbasename.properties
).static ImmutableGraph
load
(CharSequence basename) Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using no progress logger.static ImmutableGraph
load
(CharSequence basename, ProgressLogger pl) Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using a progress logger.static ImmutableGraph
loadMapped
(CharSequence basename) Creates a newImmutableGraph
by memory-mapping a graph file.static ImmutableGraph
loadMapped
(CharSequence basename, ProgressLogger pl) Creates a newImmutableGraph
by memory-mapping a graph file.static ImmutableGraph
loadOffline
(CharSequence basename) Creates a newImmutableGraph
by loading offline a graph file.static ImmutableGraph
loadOffline
(CharSequence basename, ProgressLogger pl) Creates a newImmutableGraph
by loading offline a graph file.static ImmutableGraph
loadOnce
(InputStream is) Creates a newImmutableGraph
by loading a read-once graph from an input stream.static ImmutableGraph
loadSequential
(CharSequence basename) Deprecated.static ImmutableGraph
loadSequential
(CharSequence basename, ProgressLogger pl) Deprecated.Returns a node iterator for scanning the graph sequentially, starting from the first node.nodeIterator
(long from) Returns a node iterator for scanning the graph sequentially, starting from the given node.long
numArcs()
Returns the number of arcs of this graph (optional operation).abstract long
numNodes()
Returns the number of nodes of this graph.abstract long
outdegree
(long x) Returns the outdegree of a node.Returns an iterator enumerating the outdegrees of the nodes of this graph.abstract boolean
Checks whether this graph provides random access to successor lists.splitNodeIterators
(int howMany) Returns an array of node iterators, scanning each a portion of the nodes of a graph.static void
store
(Class<?> graphClass, ImmutableGraph graph, CharSequence basename) Stores an immutable graph using a specified subclass.static void
store
(Class<?> graphClass, ImmutableGraph graph, CharSequence basename, ProgressLogger pl) Stores an immutable graph using a specified subclass and a progress logger.long[][]
successorBigArray
(long x) Returns a reference to a big array containing the successors of a given node.successors
(long x) Returns a lazy iterator over the successors of a given node.toString()
static ImmutableGraph
wrap
(ImmutableGraph graph) static ImmutableGraph
wrap
(ImmutableGraph graph)
-
Field Details
-
GRAPHCLASS_PROPERTY_KEY
- See Also:
-
PROPERTIES_EXTENSION
The standard extension of property files.- See Also:
-
NUMBER_OF_THREADS_PROPERTY
The property used to set the number of parallel compression threads.- See Also:
-
-
Constructor Details
-
ImmutableGraph
public ImmutableGraph()
-
-
Method Details
-
numNodes
public abstract long numNodes()Returns the number of nodes of this graph.Albeit this method is not optional, it is allowed that this method throws an
UnsupportedOperationException
if this graph has never been entirely traversed using anode iterator
. This apparently bizarre behaviour is necessary to support implementations asArcListASCIIGraph
, which do not know the actual number of nodes until a traversal has been completed.- Returns:
- the number of nodes.
-
intNumNodes
public int intNumNodes()A method returning the number of nodes as an integer, for easier backward compatibility.- Returns:
numNodes()
, if it is smaller thanInteger.MAX_VALUE
; otherwise, an exception will be thrown.- Throws:
IllegalStateException
- ifnumNodes()
is larger thanInteger.MAX_VALUE
.
-
numArcs
public long numArcs()Returns the number of arcs of this graph (optional operation).- Returns:
- the number of arcs.
-
randomAccess
public abstract boolean randomAccess()Checks whether this graph provides random access to successor lists.- Returns:
- true if this graph provides random access to successor lists.
-
hasCopiableIterators
public boolean hasCopiableIterators()Whether the node iterators returned by this graph supportNodeIterator.copy(long)
.- Returns:
- true if this graph provides copiable iterators.
- Implementation Specification:
- This implementation just returns
randomAccess()
.
-
basename
Returns a symbolic basename for this graph (optional operation).Implementors of this class may provide a basename (usually a pathname from which various files storing the graph are stemmed). This method is optional because it is sometimes unmeaningful (e.g., for one-off anonymous classes).
- Returns:
- the basename.
-
successors
Returns a lazy iterator over the successors of a given node. The iteration terminates when -1 is returned.- Parameters:
x
- a node.- Returns:
- a lazy iterator over the successors of the node.
- Implementation Specification:
- This implementation just wraps the array returned by
successorBigArray(long)
. Subclasses are encouraged to override this implementation. - Implementation Notes:
- The semantics of this method has been significantly modified in WebGraph 2.0 to take advantage of the new, faster lazy architecture.
-
successorBigArray
public long[][] successorBigArray(long x) Returns a reference to a big array containing the successors of a given node.The returned big array may contain more entries than the outdegree of
x
. However, only those with indices from 0 (inclusive) to the outdegree ofx
(exclusive) contain valid data.- Parameters:
x
- a node.- Returns:
- a big array whose first elements are the successors of the node; the array must not be modified by the caller.
- Implementation Specification:
- This implementation just unwraps the iterator returned by
successors(long)
. Subclasses are encouraged to override this implementation.
-
outdegree
public abstract long outdegree(long x) Returns the outdegree of a node.- Parameters:
x
- a node.- Returns:
- the outdegree of the given node.
- Throws:
IllegalStateException
- if called without offsets.
-
nodeIterator
Returns a node iterator for scanning the graph sequentially, starting from the given node.- Parameters:
from
- the node from which the iterator will iterate.- Returns:
- a
NodeIterator
for accessing nodes and successors sequentially. - Implementation Specification:
- This implementation just calls the random-access methods (
successors(long)
andoutdegree(long)
). More specific implementations may choose to maintain some extra state to make the enumeration more efficient.
-
nodeIterator
Returns a node iterator for scanning the graph sequentially, starting from the first node.- Returns:
- a
NodeIterator
for accessing nodes and successors sequentially.
-
splitNodeIterators
Returns an array of node iterators, scanning each a portion of the nodes of a graph. Iterators are guaranteed to scan mutually disjoint sets of nodes, and every node is guaranteed to be scanned by one iterator.This is an optional operation. If implemented, though, the returned iterators must properly implement
NodeIterator.copy(long)
.- Parameters:
howMany
- the number of iterators to be returned (at the end of the array, some of them may be empty).- Returns:
- the required iterators.
-
copy
Returns a flyweight copy of this immutable graph.- Specified by:
copy
in interfaceFlyweightPrototype<ImmutableGraph>
- Returns:
- a flyweight copy of this immutable graph.
- Throws:
UnsupportedOperationException
- if flyweight copies are not supported: support is guaranteed only ifrandomAccess()
returns true.- See Also:
-
outdegrees
Returns an iterator enumerating the outdegrees of the nodes of this graph.- Returns:
- an iterator enumerating the outdegrees of the nodes of this graph.
-
toString
-
loadSequential
Deprecated.UseloadOffline(CharSequence)
orloadMapped(CharSequence)
instead.Creates a newImmutableGraph
by loading a graph file from disk to memory, without offsets.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadSequential
@Deprecated public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException Deprecated.Creates a newImmutableGraph
by loading a graph file from disk to memory, without offsets.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the graph, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadOffline
Creates a newImmutableGraph
by loading offline a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadOffline
public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException Creates a newImmutableGraph
by loading offline a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the graph, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadMapped
public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException Creates a newImmutableGraph
by memory-mapping a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the offsets, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while memory-mapping the graph or reading the offsets.
-
loadMapped
Creates a newImmutableGraph
by memory-mapping a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while memory-mapping the graph or reading the offsets.
-
loadOnce
Creates a newImmutableGraph
by loading a read-once graph from an input stream.- Parameters:
is
- an input stream containing the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.UnsupportedOperationException
- if this graph class does not support read-once graphs.- Implementation Specification:
- This implementation just throws a
UnsupportedOperationException
. There is no way to write a generic implementation, because there is no way to know in advance the class that should read the graph.
-
load
Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using no progress logger.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
load
Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using a progress logger.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the graph, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
load
protected static ImmutableGraph load(ImmutableGraph.LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl) throws IOException Creates a new immutable graph by loading a graph file from disk to memory, delegating the actual loading to the class specified in thegraphclass
property within the property file (namedbasename.properties
). The exact load method to be used depends on themethod
argument.This method uses the properties convention described in the introduction.
- Parameters:
method
- the method to be used to load the graph.basename
- the basename of the graph, ifmethod
is notImmutableGraph.LoadMethod.ONCE
.is
- an input stream the containing the graph, ifmethod
isImmutableGraph.LoadMethod.ONCE
.pl
- the progress logger; it can benull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
store
public static void store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename, ProgressLogger pl) throws IOException Stores an immutable graph using a specified subclass and a progress logger.This method is a useful shorthand that invoke by reflection the store method of a given subclass. Note, however, that usually a subclass will provide more refined store methods with more parameters.
- Parameters:
graphClass
- the subclass ofImmutableGraph
that should store the graph.graph
- the graph to store.basename
- the basename.pl
- a progress logger, ornull
.- Throws:
IOException
-
store
public static void store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename) throws IOException Stores an immutable graph using a specified subclass.- Parameters:
graphClass
- the subclass ofImmutableGraph
that should store the graph.graph
- the graph to store.basename
- the basename.- Throws:
IOException
- See Also:
-
equals
Compare this immutable graph to another object. -
hashCode
public int hashCode()Returns a hash code for this immutable graph. -
wrap
-
wrap
-
loadOffline(CharSequence)
orloadMapped(CharSequence)
instead.