Class ScatteredArcsASCIIGraph
- All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>
public class ScatteredArcsASCIIGraph extends ImmutableSequentialGraph
ImmutableGraph
that corresponds to a graph stored as a scattered list of arcs.
A scattered list of arcs describes a graph in a fairly loose way. Each line contains an arc specified as two node identifiers separated by whitespace (but we suggest exactly one TAB character). Sources and targets can be in any order.
In the standard description, node identifiers can be in the range
[-263..263): they will be remapped in a compact identifier space by
assigning to each newly seen identifier a new node number. The list of identifiers in order of
appearance is available in ids
. Lines can be empty, or comments starting with
#
. Characters following the target will be discarded with a warning.
Warning: Lines not conforming the above specification will cause an error to be logged, but will be otherwise ignored.
Alternatively, you can
provide an Object2LongFunction<String>
with default return value
-1 that will be used to map identifiers to node numbers, along with a Charset
to parse
lines and the number of nodes of the graph (which must be a strict upper bound for the largest
value returned by the function). Note that in principle an Object2IntFunction
would be
sufficient, but we want to make easier using functions from Sux4J such as GOV3Function
.
Additionally, the resulting graph can be symmetrized, and its loops be removed, using suitable constructor options.
This class has no load method, and its main method converts a scattered-arcs representation
directly into a BVGraph
.
Using ScatteredArcsASCIIGraph
to convert your data
A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs
specified by scattered arcs. Suppose you create the following file, named
example.arcs
:
# My graph -1 15 15 2 2 -1 This will cause a warning to be logged OOPS! (This will cause an error to be logged) -1 2Then, the command
java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example <example.arcswill produce a compressed graph in
BVGraph
format with basename
example
. The file example.ids
will contain the list of longs -1, 15, 2.
The node with identifer -1 will be the node 0 in the output graph, the node with identifier 15
will be node 1, and the node with identifier 2 will be node 2. The graph example
will thus have three nodes and four arcs (viz., <0,1>, <0,2>, <1,2> and
<2,0>).
Memory requirements
To convert node identifiers to node numbers, instances of this class use a custom map that in the worst case will require 19.5×2⌈log(4n/3)⌉ ≤ 52n bytes, where n is the number of distinct identifiers. Storing batches of arcs in memory requires 8 bytes per arc.
-
Nested Class Summary
Nested classes/interfaces inherited from class it.unimi.dsi.webgraph.ImmutableGraph
ImmutableGraph.LoadMethod
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BATCH_SIZE
The default batch size.long[]
ids
The list of identifiers in order of appearance.Fields inherited from class it.unimi.dsi.webgraph.ImmutableGraph
GRAPHCLASS_PROPERTY_KEY, NUMBER_OF_THREADS_PROPERTY, PROPERTIES_EXTENSION
-
Constructor Summary
Constructors Constructor Description ScatteredArcsASCIIGraph(InputStream is)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops, int batchSize)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl)
Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph(Iterator<long[]> arcs, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl)
Creates a scattered-arcs ASCII graph. -
Method Summary
Modifier and Type Method Description ScatteredArcsASCIIGraph
copy()
Throws anUnsupportedOperationException
.boolean
hasCopiableIterators()
Whether the node iterators returned by this graph supportNodeIterator.copy(int)
.static void
main(String[] args)
NodeIterator
nodeIterator(int from)
Returns a node iterator for scanning the graph sequentially, starting from the given node.long
numArcs()
Returns the number of arcs of this graph (optional operation).int
numNodes()
Returns the number of nodes of this graph.Methods inherited from class it.unimi.dsi.webgraph.ImmutableSequentialGraph
outdegree, randomAccess, successorArray
Methods inherited from class it.unimi.dsi.webgraph.ImmutableGraph
basename, equals, hashCode, load, load, load, loadMapped, loadMapped, loadOffline, loadOffline, loadOnce, loadSequential, loadSequential, nodeIterator, outdegrees, splitNodeIterators, store, store, successors, toString
-
Field Details
-
DEFAULT_BATCH_SIZE
public static final int DEFAULT_BATCH_SIZEThe default batch size.- See Also:
- Constant Field Values
-
ids
public long[] idsThe list of identifiers in order of appearance.
-
-
Constructor Details
-
ScatteredArcsASCIIGraph
Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.- Throws:
IOException
-
ScatteredArcsASCIIGraph
Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.pl
- a progress logger, ornull
.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops, int batchSize) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, int n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.pl
- a progress logger, ornull
.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(Iterator<long[]> arcs, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOExceptionCreates a scattered-arcs ASCII graph.- Parameters:
arcs
- an iterator returning the arcs as two-element arrays.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.pl
- a progress logger, ornull
.- Throws:
IOException
-
-
Method Details
-
numNodes
public int numNodes()Description copied from class:ImmutableGraph
Returns the number of nodes of this graph.Albeit this method is not optional, it is allowed that this method throws an
UnsupportedOperationException
if this graph has never been entirely traversed using anode iterator
. This apparently bizarre behaviour is necessary to support implementations asArcListASCIIGraph
, which do not know the actual number of nodes until a traversal has been completed.- Specified by:
numNodes
in classImmutableGraph
- Returns:
- the number of nodes.
-
numArcs
public long numArcs()Description copied from class:ImmutableGraph
Returns the number of arcs of this graph (optional operation).- Overrides:
numArcs
in classImmutableGraph
- Returns:
- the number of arcs.
-
nodeIterator
Description copied from class:ImmutableGraph
Returns a node iterator for scanning the graph sequentially, starting from the given node.- Overrides:
nodeIterator
in classImmutableSequentialGraph
- Parameters:
from
- the node from which the iterator will iterate.- Returns:
- a
NodeIterator
for accessing nodes and successors sequentially.
-
hasCopiableIterators
public boolean hasCopiableIterators()Description copied from class:ImmutableGraph
Whether the node iterators returned by this graph supportNodeIterator.copy(int)
.- Overrides:
hasCopiableIterators
in classImmutableGraph
- Returns:
- true if this graph provides copiable iterators.
-
copy
Description copied from class:ImmutableSequentialGraph
Throws anUnsupportedOperationException
.- Specified by:
copy
in interfaceFlyweightPrototype<ImmutableGraph>
- Overrides:
copy
in classImmutableSequentialGraph
- Returns:
- a flyweight copy of this immutable graph.
- See Also:
FlyweightPrototype
-
main
public static void main(String[] args) throws IllegalArgumentException, SecurityException, IOException, JSAPException, ClassNotFoundException
-