Class ScatteredArcsASCIIGraph

All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>

public class ScatteredArcsASCIIGraph
extends ImmutableSequentialGraph
An ImmutableGraph that corresponds to a graph stored as a scattered list of arcs.

A scattered list of arcs describes a graph in a fairly loose way. Each line contains an arc specified as two node identifiers separated by whitespace (but we suggest exactly one TAB character). Sources and targets can be in any order.

In the standard description, node identifiers can be in the range [-263..263): they will be remapped in a compact identifier space by assigning to each newly seen identifier a new node number. The list of identifiers in order of appearance is available in ids. Lines can be empty, or comments starting with #. Characters following the target will be discarded with a warning.

Warning: Lines not conforming the above specification will cause an error to be logged, but will be otherwise ignored.

Alternatively, you can provide an Object2LongFunction<String> with default return value -1 that will be used to map identifiers to node numbers, along with a Charset to parse lines and the number of nodes of the graph (which must be a strict upper bound for the largest value returned by the function).

A faster conversion can be obtained by providing an Object2LongFunction<byte[]>: in this case, no character decoding is performed, and keys are bytes sequences. This technique goes hand in hand with classes such as GOV3Function, which can store efficiently a map from byte arrays to longs.

Additionally, the resulting graph can be symmetrized, and its loops be removed, using suitable constructor options.

This class has no load method, and its main method converts a scattered-arcs representation directly into a BVGraph.

Using ScatteredArcsASCIIGraph to convert your data

A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by scattered arcs. Suppose you create the following file, named example.arcs:

  # My graph
  -1 15
  15 2
  2 -1 This will cause a warning to be logged
  OOPS! (This will cause an error to be logged)
  -1 2
 
Then, the command
  java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example <example.arcs
 
will produce a compressed graph in BVGraph format with basename example. The file example.ids will contain the list of longs -1, 15, 2. The node with identifer -1 will be the node 0 in the output graph, the node with identifier 15 will be node 1, and the node with identifier 2 will be node 2. The graph example will thus have three nodes and four arcs (viz., <0,1>, <0,2>, <1,2> and <2,0>).

Memory requirements

To convert node identifiers to node numbers, instances of this class use a custom map that in the worst case will require 25.5×2⌈log(4n/3)⌉ ≤ 68n bytes, where n is the number of distinct identifiers. Storing batches of arcs in memory requires 16 bytes per arc.

  • Field Details

    • DEFAULT_BATCH_SIZE

      public static final int DEFAULT_BATCH_SIZE
      The default batch size.
      See Also:
      Constant Field Values
    • ids

      public long[][] ids
      The big-array list of identifiers in order of appearance.
  • Constructor Details

    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, boolean symmetrize) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, boolean symmetrize, boolean noLoops) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, boolean symmetrize, boolean noLoops, int batchSize) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<? extends CharSequence> function, Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<byte[]> function, long n) throws IOException
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from byte arrays representing nodes to node numbers.
      n - the number of nodes of the graph (used only if function is not null).
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize) throws IOException
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from byte arrays representing nodes to node numbers.
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops) throws IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from byte arrays representing nodes to node numbers.
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize) throws IOException
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from byte arrays representing nodes to node numbers.
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir) throws IOException
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from byte arrays representing nodes to node numbers.
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Throws:
      IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(InputStream is, Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from byte arrays representing nodes to node numbers.
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Throws:
      IOException
  • Method Details