Class ImmutableGraph
- All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>
- Direct Known Subclasses:
ArcLabelledImmutableGraph
,BVGraph
,EFGraph
,ImmutableSequentialGraph
,ImmutableSubgraph
,UnionImmutableGraph
public abstract class ImmutableGraph extends Object implements FlyweightPrototype<ImmutableGraph>
Subclasses of this class are used to create and access immutable graphs, that is, graphs that are computed once for all, stored conveniently, and then accessed repeatedly. Moreover, immutable graphs are usually very large—so large that two such graphs may not fit into central memory (the main example being a sizable portion of the web).
A subclass of this class must implement methods to obtain the number of nodes, the outdegree of a
node and the successors of a node (either successors(int)
or successorArray(int)
). Additionally, it may provide methods to
obtain the number of arcs, and a basename.
This class provides equals(Object)
and hashCode()
methods that consider
two graph equals if they have the same size and all their successor lists are equal.
Iterating on successors
Starting with WebGraph 2.0, the iterator architecture is fully lazy—you have no
hasNext()
method. Rather, the LazyIntIterator
returned by successors(int)
will return -1 when no more successors are available. The idiomatic forms for enumerating successors
via iterators are
LazyIntIterator successors = g.successors(x); int d = g.outdegree(x); while(d-- != 0) doSomething(successors.nextInt());and
LazyIntIterator successors = g.successors(x); int t; while((t = successors.nextInt()) != -1) doSomething(t);
The alternative method successorArray(int)
provides an array containing the successors
and possibly more elements. Use outdegree(int)
to know how many elements are valid.
The efficiency of successors(int)
and successorArray(int)
may vary depending on the
implementation.
Iterating on a graph in parallel
You can scan a graph sequentially using node iterators. Starting with version 3.5.0,
implementations of this class may return true on hasCopiableIterators()
, which means that
node iterators implement the optional copy(int)
method. Using copy(int)
,
the method splitNodeIterators(int)
of this class is able to provide separate, thread-safe iterators on different segments
of contiguous nodes of the graph. The class BVGraph
, for example, uses this interface to provide
parallel compression. We suggest that all classes providing parallel iteration read the system variable
"it.unimi.dsi.webgraph.threads" to override the number of parallel threads.
Building an immutable graph
Due to their large size, immutable
graphs have a peculiar serialisation scheme. Every subclass of this class
must implement a number of static methods that create an immutable
graph, given a string (usually a basename for a set of files) and, optionally, a ProgressLogger
.
The signatures that must be implemented are
ImmutableGraph load(CharSequence, ProgressLogger)
;ImmutableGraph load(CharSequence)
;ImmutableGraph loadOffline(CharSequence, ProgressLogger)
;ImmutableGraph loadOffline(CharSequence)
.ImmutableGraph loadOnce(InputStream)
;
Additionally, the following signatures can be implemented:
ImmutableGraph loadMapped(CharSequence, ProgressLogger)
;ImmutableGraph loadMapped(CharSequence)
;
The special semantics associated to loadOffline()
is that the immutable graph should be set up, and possibly some metadata could be read from disk, but no
actual data is loaded into memory; the class should guarantee that offline sequential access (i.e., by means
of nodeIterator(int)
) is still possible. In other words, in most cases nodeIterator(int)
will have to be
overridden by the subclasses to behave properly even in an offline setting (see nodeIterator()
).
The special semantics associated with loadOnce()
is that the graph can be traversed
just once using a call to nodeIterator()
. The special semantics associated with loadMapped()
is that metadata could be read from disk, but the graph will be accessed by memory mapping; the class
should guarantee that random access is possible.
Note that a simple class may just implement all special forms of graph loading delegating to the standard
load method (see, e.g., ASCIIGraph
).
Specific implementations of ImmutableGraph
may also decide to expose internal load methods
to make it easier to write load methods for subclasses
(see, e.g., loadInternal()
).
Analogously, a subclass of this class may also implement
store(ImmutableGraph, CharSequence, ProgressLogger)
;store(ImmutableGraph, CharSequence)
.
store
methods are available, as parameters vary wildly
from subclass to subclass. The method store(Class, ImmutableGraph, CharSequence, ProgressLogger)
invokes by reflection the methods above on the provided class.
The standard method to build a new immutable graph is creating a (possibly anonymous) class
that extends this class, and save it using a concrete subclass (e.g., BVGraph
). See
the source of Transform
for several examples.
Properties Conventions
To provide a simple way to load an immutable graph without knowing in advance its class,
the following convention may be followed: a graph with basename name
may feature
a Java property file name.properties
with a property graphclass
containing the actual class of the graph. In this case, you can use the implementation of the load/store
methods contained in this class, similarly to the standard Java serialisation scheme. BVGraph
, for instance,
follows this convention, but ASCIIGraph
does not.
The reason why this convention is not enforced is that it is sometimes useful to write lightweight classes,
mostly for debugging purposes, whose graph representation is entirely contained in a single file (e.g., ASCIIGraph
),
so that loadOnce(InputStream)
can be easily implemented.
Facilities for loading an immutable graph
ImmutableGraph
provides ready-made implementations of the load methods that work as follows: they
opens a property file with the given basename, and look for the graphclass
property; then, they simply
delegates the actual load to the specified graph class by reflection.
Thread-safety and flyweight copies
Implementations of this class need not be thread-safe. However, they implement the
FlyweightPrototype
pattern: the copy()
method is
thread-safe and will return a lightweight copy of the graph—usually, all immutable
data will be shared between copies. Concurrent access to different copies is safe.
Note that by contract copy()
is guaranteed to work only if randomAccess()
returns true.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ImmutableGraph.LoadMethod
A list of the methods that can be used to load a graph. -
Field Summary
Fields Modifier and Type Field Description static String
GRAPHCLASS_PROPERTY_KEY
static String
NUMBER_OF_THREADS_PROPERTY
The property used to set the number of parallel compression threads.static String
PROPERTIES_EXTENSION
The standard extension of property files. -
Constructor Summary
Constructors Constructor Description ImmutableGraph()
-
Method Summary
Modifier and Type Method Description CharSequence
basename()
Returns a symbolic basename for this graph (optional operation).abstract ImmutableGraph
copy()
Returns a flyweight copy of this immutable graph.boolean
equals(Object o)
Compare this immutable graph to another object.boolean
hasCopiableIterators()
Whether the node iterators returned by this graph supportNodeIterator.copy(int)
.int
hashCode()
Returns a hash code for this immutable graph.protected static ImmutableGraph
load(ImmutableGraph.LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl)
Creates a new immutable graph by loading a graph file from disk to memory, delegating the actual loading to the class specified in thegraphclass
property within the property file (namedbasename.properties
).static ImmutableGraph
load(CharSequence basename)
Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using no progress logger.static ImmutableGraph
load(CharSequence basename, ProgressLogger pl)
Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using a progress logger.static ImmutableGraph
loadMapped(CharSequence basename)
Creates a newImmutableGraph
by memory-mapping a graph file.static ImmutableGraph
loadMapped(CharSequence basename, ProgressLogger pl)
Creates a newImmutableGraph
by memory-mapping a graph file.static ImmutableGraph
loadOffline(CharSequence basename)
Creates a newImmutableGraph
by loading offline a graph file.static ImmutableGraph
loadOffline(CharSequence basename, ProgressLogger pl)
Creates a newImmutableGraph
by loading offline a graph file.static ImmutableGraph
loadOnce(InputStream is)
Creates a newImmutableGraph
by loading a read-once graph from an input stream.static ImmutableGraph
loadSequential(CharSequence basename)
Deprecated.static ImmutableGraph
loadSequential(CharSequence basename, ProgressLogger pl)
Deprecated.NodeIterator
nodeIterator()
Returns a node iterator for scanning the graph sequentially, starting from the first node.NodeIterator
nodeIterator(int from)
Returns a node iterator for scanning the graph sequentially, starting from the given node.long
numArcs()
Returns the number of arcs of this graph (optional operation).abstract int
numNodes()
Returns the number of nodes of this graph.abstract int
outdegree(int x)
Returns the outdegree of a node.IntIterator
outdegrees()
Returns an iterator enumerating the outdegrees of the nodes of this graph.abstract boolean
randomAccess()
Checks whether this graph provides random access to successor lists.NodeIterator[]
splitNodeIterators(int howMany)
Returns an array of node iterators, scanning each a portion of the nodes of a graph.static void
store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename)
Stores an immutable graph using a specified subclass.static void
store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename, ProgressLogger pl)
Stores an immutable graph using a specified subclass and a progress logger.int[]
successorArray(int x)
Returns a reference to an array containing the successors of a given node.LazyIntIterator
successors(int x)
Returns a lazy iterator over the successors of a given node.String
toString()
-
Field Details
-
GRAPHCLASS_PROPERTY_KEY
- See Also:
- Constant Field Values
-
PROPERTIES_EXTENSION
The standard extension of property files.- See Also:
- Constant Field Values
-
NUMBER_OF_THREADS_PROPERTY
The property used to set the number of parallel compression threads.- See Also:
- Constant Field Values
-
-
Constructor Details
-
ImmutableGraph
public ImmutableGraph()
-
-
Method Details
-
numNodes
public abstract int numNodes()Returns the number of nodes of this graph.Albeit this method is not optional, it is allowed that this method throws an
UnsupportedOperationException
if this graph has never been entirely traversed using anode iterator
. This apparently bizarre behaviour is necessary to support implementations asArcListASCIIGraph
, which do not know the actual number of nodes until a traversal has been completed.- Returns:
- the number of nodes.
-
numArcs
public long numArcs()Returns the number of arcs of this graph (optional operation).- Returns:
- the number of arcs.
-
randomAccess
public abstract boolean randomAccess()Checks whether this graph provides random access to successor lists.- Returns:
- true if this graph provides random access to successor lists.
-
hasCopiableIterators
public boolean hasCopiableIterators()Whether the node iterators returned by this graph supportNodeIterator.copy(int)
.- Returns:
- true if this graph provides copiable iterators.
- Implementation Specification:
- This implementation just returns
randomAccess()
.
-
basename
Returns a symbolic basename for this graph (optional operation).- Returns:
- the basename.
- Implementation Notes:
- Implementors of this class may provide a basename (usually a pathname from which various files storing the graph are stemmed). This method is optional because it is sometimes unmeaningful (e.g., for one-off anonymous classes).
-
successors
Returns a lazy iterator over the successors of a given node. The iteration terminates when -1 is returned.- Parameters:
x
- a node.- Returns:
- a lazy iterator over the successors of the node.
- API Notes:
- The semantics of this method has been significantly modified in WebGraph 2.0 to take advantage of the new, faster lazy architecture.
- Implementation Specification:
- This implementation just wraps the array returned by
successorArray(int)
. Subclasses are encouraged to override this implementation.
-
successorArray
public int[] successorArray(int x)Returns a reference to an array containing the successors of a given node.The returned array may contain more entries than the outdegree of
x
. However, only those with indices from 0 (inclusive) to the outdegree ofx
(exclusive) contain valid data.- Parameters:
x
- a node.- Returns:
- an array whose first elements are the successors of the node; the array must not be modified by the caller.
- Implementation Specification:
- This implementation just unwraps the iterator returned by
successors(int)
. Subclasses are encouraged to override this implementation. - Implementation Notes:
- All implementations must guarantee that a distinct array is returned for each node. The caller, in turn, must treat the array as a read-only object.
-
outdegree
public abstract int outdegree(int x)Returns the outdegree of a node.- Parameters:
x
- a node.- Returns:
- the outdegree of the given node.
- Throws:
IllegalStateException
- if called without offsets.
-
nodeIterator
Returns a node iterator for scanning the graph sequentially, starting from the given node.- Parameters:
from
- the node from which the iterator will iterate.- Returns:
- a
NodeIterator
for accessing nodes and successors sequentially. - Implementation Specification:
- This implementation just calls the random-access methods (
successors(int)
andoutdegree(int)
). More specific implementations may choose to maintain some extra state to make the enumeration more efficient.
-
nodeIterator
Returns a node iterator for scanning the graph sequentially, starting from the first node.- Returns:
- a
NodeIterator
for accessing nodes and successors sequentially.
-
splitNodeIterators
Returns an array of node iterators, scanning each a portion of the nodes of a graph. Iterators are guaranteed to scan mutually disjoint sets of nodes, and every node is guaranteed to be scanned by one iterator.This is an optional operation. If implemented, though, the returned iterators must properly implement
NodeIterator.copy(int)
.- Parameters:
howMany
- the number of iterators to be returned (at the end of the array, some of them may be empty).- Returns:
- the required iterators.
-
copy
Returns a flyweight copy of this immutable graph.- Specified by:
copy
in interfaceFlyweightPrototype<ImmutableGraph>
- Returns:
- a flyweight copy of this immutable graph.
- Throws:
UnsupportedOperationException
- if flyweight copies are not supported: support is guaranteed only ifrandomAccess()
returns true.- See Also:
FlyweightPrototype
-
toString
-
outdegrees
Returns an iterator enumerating the outdegrees of the nodes of this graph.- Returns:
- an iterator enumerating the outdegrees of the nodes of this graph.
-
loadSequential
Deprecated.UseloadOffline(CharSequence)
orloadMapped(CharSequence)
instead.Creates a newImmutableGraph
by loading a graph file from disk to memory, without offsets.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadSequential
@Deprecated public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOExceptionDeprecated.Creates a newImmutableGraph
by loading a graph file from disk to memory, without offsets.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the graph, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadOffline
Creates a newImmutableGraph
by loading offline a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadOffline
public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOExceptionCreates a newImmutableGraph
by loading offline a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the graph, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
loadMapped
public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOExceptionCreates a newImmutableGraph
by memory-mapping a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the offsets, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while memory-mapping the graph or reading the offsets.
-
loadMapped
Creates a newImmutableGraph
by memory-mapping a graph file.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while memory-mapping the graph or reading the offsets.
-
loadOnce
Creates a newImmutableGraph
by loading a read-once graph from an input stream.- Parameters:
is
- an input stream containing the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.UnsupportedOperationException
- if this graph class does not support read-once graphs.- Implementation Specification:
- This implementation just throws a
UnsupportedOperationException
. There is no way to write a generic implementation, because there is no way to know in advance the class that should read the graph.
-
load
Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using no progress logger.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
load
Creates a newImmutableGraph
by loading a graph file from disk to memory, with all offsets, using a progress logger.This method uses the properties convention described in the introduction.
- Parameters:
basename
- the basename of the graph.pl
- a progress logger used while loading the graph, ornull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
load
protected static ImmutableGraph load(ImmutableGraph.LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl) throws IOExceptionCreates a new immutable graph by loading a graph file from disk to memory, delegating the actual loading to the class specified in thegraphclass
property within the property file (namedbasename.properties
). The exact load method to be used depends on themethod
argument.This method uses the properties convention described in the introduction.
- Parameters:
method
- the method to be used to load the graph.basename
- the basename of the graph, ifmethod
is notImmutableGraph.LoadMethod.ONCE
.is
- an input stream the containing the graph, ifmethod
isImmutableGraph.LoadMethod.ONCE
.pl
- the progress logger; it can benull
.- Returns:
- an
ImmutableGraph
containing the specified graph. - Throws:
IOException
- if an I/O exception occurs while reading the graph.
-
store
public static void store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename, ProgressLogger pl) throws IOExceptionStores an immutable graph using a specified subclass and a progress logger.This method is a useful shorthand that invoke by reflection the store method of a given subclass. Note, however, that usually a subclass will provide more refined store methods with more parameters.
- Parameters:
graphClass
- the subclass ofImmutableGraph
that should store the graph.graph
- the graph to store.basename
- the basename.pl
- a progress logger, ornull
.- Throws:
IOException
-
store
public static void store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename) throws IOExceptionStores an immutable graph using a specified subclass.- Parameters:
graphClass
- the subclass ofImmutableGraph
that should store the graph.graph
- the graph to store.basename
- the basename.- Throws:
IOException
- See Also:
store(Class, ImmutableGraph, CharSequence, ProgressLogger)
-
equals
Compare this immutable graph to another object. -
hashCode
public int hashCode()Returns a hash code for this immutable graph.
-
loadOffline(CharSequence)
orloadMapped(CharSequence)
instead.