Class ScatteredArcsASCIIGraph
- All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>
ImmutableGraph
that corresponds to a graph stored as a scattered list of arcs.
A scattered list of arcs describes a graph in a fairly loose way. Each line contains an arc specified as two node identifiers separated by whitespace (but we suggest exactly one TAB character). Sources and targets can be in any order.
In the standard description, node identifiers can be in the range
[-263..263): they will be remapped in a compact identifier space by
assigning to each newly seen identifier a new node number. The list of identifiers in order of
appearance is available in ids
. Lines can be empty, or comments starting with
#
. Characters following the target will be discarded with a warning.
Warning: Lines not conforming the above specification will cause an error to be logged, but will be otherwise ignored.
Alternatively, you can
provide an Object2LongFunction<String>
with default return value
-1 that will be used to map identifiers to node numbers, along with a Charset
to parse
lines and the number of nodes of the graph (which must be a strict upper bound for the largest
value returned by the function).
A faster conversion can be obtained by
providing an Object2LongFunction<byte[]>
: in this case, no
character decoding is performed, and keys are bytes sequences. This technique goes hand in hand
with classes such as GOV3Function
, which can store efficiently a map from byte arrays to
longs.
Additionally, the resulting graph can be symmetrized, and its loops be removed, using suitable constructor options.
This class has no load method, and its main method converts a scattered-arcs representation
directly into a BVGraph
.
Using ScatteredArcsASCIIGraph
to convert your data
A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs
specified by scattered arcs. Suppose you create the following file, named
example.arcs
:
# My graph -1 15 15 2 2 -1 This will cause a warning to be logged OOPS! (This will cause an error to be logged) -1 2Then, the command
java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example <example.arcswill produce a compressed graph in
BVGraph
format with basename
example
. The file example.ids
will contain the list of longs -1, 15, 2.
The node with identifer -1 will be the node 0 in the output graph, the node with identifier 15
will be node 1, and the node with identifier 2 will be node 2. The graph example
will thus have three nodes and four arcs (viz., <0,1>, <0,2>, <1,2> and
<2,0>).
Memory requirements
To convert node identifiers to node numbers, instances of this class use a custom map that in the worst case will require 25.5×2⌈log(4n/3)⌉ ≤ 68n bytes, where n is the number of distinct identifiers. Storing batches of arcs in memory requires 16 bytes per arc.
-
Nested Class Summary
Nested classes/interfaces inherited from class it.unimi.dsi.big.webgraph.ImmutableGraph
ImmutableGraph.LoadMethod
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The default batch size.long[][]
The big-array list of identifiers in order of appearance.Fields inherited from class it.unimi.dsi.big.webgraph.ImmutableGraph
GRAPHCLASS_PROPERTY_KEY, NUMBER_OF_THREADS_PROPERTY, PROPERTIES_EXTENSION
-
Constructor Summary
ConstructorDescriptionCreates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, boolean symmetrize) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, boolean symmetrize, boolean noLoops) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, boolean symmetrize, boolean noLoops, int batchSize) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<byte[]> function, long n) Creates a scattered-arcs ASCII graph using a function on byte arrays.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize) Creates a scattered-arcs ASCII graph using a function on byte arrays.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops) Creates a scattered-arcs ASCII graph.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize) Creates a scattered-arcs ASCII graph using a function on byte arrays.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) Creates a scattered-arcs ASCII graph using a function on byte arrays.ScatteredArcsASCIIGraph
(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) Creates a scattered-arcs ASCII graph using a function on byte arrays. -
Method Summary
Modifier and TypeMethodDescriptioncopy()
Throws anUnsupportedOperationException
.boolean
Whether the node iterators returned by this graph supportNodeIterator.copy(long)
.static void
nodeIterator
(long from) Returns a node iterator for scanning the graph sequentially, starting from the given node.long
numArcs()
Returns the number of arcs of this graph (optional operation).long
numNodes()
Returns the number of nodes of this graph.Methods inherited from class it.unimi.dsi.big.webgraph.ImmutableSequentialGraph
outdegree, randomAccess, successorBigArray
Methods inherited from class it.unimi.dsi.big.webgraph.ImmutableGraph
basename, equals, hashCode, intNumNodes, load, load, load, loadMapped, loadMapped, loadOffline, loadOffline, loadOnce, loadSequential, loadSequential, nodeIterator, outdegrees, splitNodeIterators, store, store, successors, toString, wrap, wrap
-
Field Details
-
DEFAULT_BATCH_SIZE
public static final int DEFAULT_BATCH_SIZEThe default batch size.- See Also:
-
ids
public long[][] idsThe big-array list of identifiers in order of appearance.
-
-
Constructor Details
-
ScatteredArcsASCIIGraph
Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.- Throws:
IOException
-
ScatteredArcsASCIIGraph
Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a standard scattered list of arcs.symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.pl
- a progress logger, ornull
.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from string representing nodes to node numbers, ornull
for the standard behaviour.charset
- a character set that will be used to read the identifiers passed tofunction
, ornull
for ISO-8859-1 (used only iffunction
is notnull
).n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.pl
- a progress logger, ornull
.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<byte[]> function, long n) throws IOException Creates a scattered-arcs ASCII graph using a function on byte arrays.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from byte arrays representing nodes to node numbers.n
- the number of nodes of the graph (used only iffunction
is notnull
).- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize) throws IOException Creates a scattered-arcs ASCII graph using a function on byte arrays.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from byte arrays representing nodes to node numbers.n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops) throws IOException Creates a scattered-arcs ASCII graph.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from byte arrays representing nodes to node numbers.n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize) throws IOException Creates a scattered-arcs ASCII graph using a function on byte arrays.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from byte arrays representing nodes to node numbers.n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOException Creates a scattered-arcs ASCII graph using a function on byte arrays.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from byte arrays representing nodes to node numbers.n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.- Throws:
IOException
-
ScatteredArcsASCIIGraph
public ScatteredArcsASCIIGraph(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOException Creates a scattered-arcs ASCII graph using a function on byte arrays.- Parameters:
is
- an input stream containing a scattered list of arcs.function
- an explicitly provided function from byte arrays representing nodes to node numbers.n
- the number of nodes of the graph (used only iffunction
is notnull
).symmetrize
- the new graph will be forced to be symmetric.noLoops
- the new graph will have no loops.batchSize
- the number of integers in a batch; two arrays of integers of this size will be allocated by this method.tempDir
- a temporary directory for the batches, ornull
forFile.createTempFile(java.lang.String, java.lang.String)
's choice.pl
- a progress logger, ornull
.- Throws:
IOException
-
-
Method Details
-
numNodes
public long numNodes()Description copied from class:ImmutableGraph
Returns the number of nodes of this graph.Albeit this method is not optional, it is allowed that this method throws an
UnsupportedOperationException
if this graph has never been entirely traversed using anode iterator
. This apparently bizarre behaviour is necessary to support implementations asArcListASCIIGraph
, which do not know the actual number of nodes until a traversal has been completed.- Specified by:
numNodes
in classImmutableGraph
- Returns:
- the number of nodes.
-
numArcs
public long numArcs()Description copied from class:ImmutableGraph
Returns the number of arcs of this graph (optional operation).- Overrides:
numArcs
in classImmutableGraph
- Returns:
- the number of arcs.
-
nodeIterator
Description copied from class:ImmutableGraph
Returns a node iterator for scanning the graph sequentially, starting from the given node.- Overrides:
nodeIterator
in classImmutableSequentialGraph
- Parameters:
from
- the node from which the iterator will iterate.- Returns:
- a
NodeIterator
for accessing nodes and successors sequentially.
-
hasCopiableIterators
public boolean hasCopiableIterators()Description copied from class:ImmutableGraph
Whether the node iterators returned by this graph supportNodeIterator.copy(long)
.- Overrides:
hasCopiableIterators
in classImmutableGraph
- Returns:
- true if this graph provides copiable iterators.
-
copy
Description copied from class:ImmutableSequentialGraph
Throws anUnsupportedOperationException
.- Specified by:
copy
in interfaceFlyweightPrototype<ImmutableGraph>
- Overrides:
copy
in classImmutableSequentialGraph
- Returns:
- a flyweight copy of this immutable graph.
- See Also:
-
main
public static void main(String[] args) throws IllegalArgumentException, SecurityException, IOException, JSAPException, ClassNotFoundException
-