Class ScatteredArcsASCIIGraph

java.lang.Object
All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>

public class ScatteredArcsASCIIGraph
extends ImmutableSequentialGraph
An ImmutableGraph that corresponds to a graph stored as a scattered list of arcs.

A scattered list of arcs describes a graph in a fairly loose way. Each line contains an arc specified as two node identifiers separated by whitespace (but we suggest exactly one TAB character). Sources and targets can be in any order.

In the standard description, node identifiers can be in the range [-263..263): they will be remapped in a compact identifier space by assigning to each newly seen identifier a new node number. The list of identifiers in order of appearance is available in ids. Lines can be empty, or comments starting with #. Characters following the target will be discarded with a warning.

Warning: Lines not conforming the above specification will cause an error to be logged, but will be otherwise ignored.

Alternatively, you can provide an Object2LongFunction<String> with default return value -1 that will be used to map identifiers to node numbers, along with a Charset to parse lines and the number of nodes of the graph (which must be a strict upper bound for the largest value returned by the function).

Additionally, the resulting graph can be symmetrized, and its loops be removed, using suitable constructor options.

This class has no load method, and its main method converts an scattered-arcs representation directly into a BVGraph.

Using ScatteredArcsASCIIGraph to convert your data

A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by scattered arcs. Suppose you create the following file, named example.arcs:

  # My graph
  -1 15
  15 2
  2 -1 This will cause a warning to be logged
  OOPS! (This will cause an error to be logged)
  -1 2
  
Then, the command
  java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example <example.arcs
  
will produce a compressed graph in BVGraph format with basename example. The file example.ids will contain the list of longs -1, 15, 2. The node with identifer -1 will be the node 0 in the output graph, the node with identifier 15 will be node 1, and the node with identifier 2 will be node 2. The graph example will thus have three nodes and four arcs (viz., <0,1>, <0,2>, <1,2> and <2,0>).

Memory requirements

To convert node identifiers to node numbers, instances of this class use a custom map that in the worst case will require 25.5×2log(4n/3) ≤ 68n bytes, where n is the number of distinct identifiers. Storing batches of arc in memory requires 16 bytes per arc.

  • Field Details

  • Constructor Details

    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops, int batchSize) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir, ProgressLogger pl) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a standard scattered list of arcs.
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Throws:
      java.io.IOException
    • ScatteredArcsASCIIGraph

      public ScatteredArcsASCIIGraph​(java.io.InputStream is, Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir, ProgressLogger pl) throws java.io.IOException
      Creates a scattered-arcs ASCII graph.
      Parameters:
      is - an input stream containing a scattered list of arcs.
      function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
      charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
      n - the number of nodes of the graph (used only if function is not null).
      symmetrize - the new graph will be forced to be symmetric.
      noLoops - the new graph will have no loops.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Throws:
      java.io.IOException
  • Method Details

    • numNodes

      public long numNodes()
      Description copied from class: ImmutableGraph
      Returns the number of nodes of this graph.

      Albeit this method is not optional, it is allowed that this method throws an UnsupportedOperationException if this graph has never been entirely traversed using a node iterator. This apparently bizarre behaviour is necessary to support implementations as ArcListASCIIGraph, which do not know the actual number of nodes until a traversal has been completed.

      Specified by:
      numNodes in class ImmutableGraph
      Returns:
      the number of nodes.
    • numArcs

      public long numArcs()
      Description copied from class: ImmutableGraph
      Returns the number of arcs of this graph (optional operation).
      Overrides:
      numArcs in class ImmutableGraph
      Returns:
      the number of arcs.
    • nodeIterator

      public NodeIterator nodeIterator​(long from)
      Description copied from class: ImmutableGraph
      Returns a node iterator for scanning the graph sequentially, starting from the given node.

      This implementation just calls the random-access methods (ImmutableGraph.successors(long) and ImmutableGraph.outdegree(long)). More specific implementations may choose to maintain some extra state to make the enumeration more efficient.

      Overrides:
      nodeIterator in class ImmutableGraph
      Parameters:
      from - the node from which the iterator will iterate.
      Returns:
      a NodeIterator for accessing nodes and successors sequentially.
    • main

      public static void main​(java.lang.String[] args) throws java.lang.IllegalArgumentException, java.lang.SecurityException, java.io.IOException, JSAPException, java.lang.ClassNotFoundException
      Throws:
      java.lang.IllegalArgumentException
      java.lang.SecurityException
      java.io.IOException
      JSAPException
      java.lang.ClassNotFoundException