Class Transform

java.lang.Object
it.unimi.dsi.big.webgraph.Transform

public class Transform extends Object
Static methods that manipulate immutable graphs.

Most methods take an ImmutableGraph (along with some other data, that depend on the kind of transformation), and return another ImmutableGraph that represents the transformed version.

  • Field Details

    • NO_LOOPS

      public static final it.unimi.dsi.big.webgraph.Transform.NoLoops NO_LOOPS
      A singleton providing an arc filter that rejects loops.
  • Method Details

    • filterArcs

      public static ImmutableGraph filterArcs(ImmutableGraph graph, Transform.ArcFilter filter, ProgressLogger ignored)
      Returns a graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a graph.
      filter - the filter (telling whether each arc should be kept or not).
      ignored - a progress logger, which will be ignored.
      Returns:
      the filtered graph.
    • filterArcs

      Returns a labelled graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a labelled graph.
      filter - the filter (telling whether each arc should be kept or not).
      ignored - a progress logger, which will be ignored.
      Returns:
      the filtered graph.
    • filterArcs

      public static ImmutableGraph filterArcs(ImmutableGraph graph, Transform.ArcFilter filter)
      Returns a graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a graph.
      filter - the filter (telling whether each arc should be kept or not).
      Returns:
      the filtered graph.
    • filterArcs

      Returns a labelled graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a labelled graph.
      filter - the filter (telling whether each arc should be kept or not).
      Returns:
      the filtered graph.
    • symmetrizeOffline

      public static ImmutableGraph symmetrizeOffline(ImmutableGraph g, int batchSize) throws IOException
      Returns a symmetrized graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      the symmetrized graph.
      Throws:
      IOException
      See Also:
    • symmetrizeOffline

      public static ImmutableGraph symmetrizeOffline(ImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns a symmetrized graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      the symmetrized graph.
      Throws:
      IOException
      See Also:
    • symmetrizeOffline

      public static ImmutableGraph symmetrizeOffline(ImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns a symmetrized graph using an offline transposition.

      The symmetrized graph is the union of a graph and of its transpose. This method will compute the transpose on the fly using transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger).

      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Returns:
      the symmetrized graph.
      Throws:
      IOException
    • simplify

      public static ImmutableGraph simplify(ImmutableGraph g, ImmutableGraph t, ProgressLogger pl)
      Returns a simplified (loopless and symmetric) graph using the graph and its transpose.
      Parameters:
      g - the source graph.
      t - the graph g transposed.
      pl - a progress logger, or null.
      Returns:
      the simplified (loopless and symmetric) graph.
    • simplify

      public static ImmutableGraph simplify(ImmutableGraph g, ImmutableGraph t)
      Returns a simplified (loopless and symmetric) graph using the graph and its transpose.
      Parameters:
      g - the source graph.
      t - the graph g transposed.
      Returns:
      the simplified (loopless and symmetric) graph.
    • simplifyOffline

      public static ImmutableGraph simplifyOffline(ImmutableGraph g, int batchSize) throws IOException
      Returns a simplified (loopless and symmetric) graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      the simplified (loopless and symmetric) graph.
      Throws:
      IOException
      See Also:
    • simplifyOffline

      public static ImmutableGraph simplifyOffline(ImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns a simplified (loopless and symmetric) graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      the simplified (loopless and symmetric) graph.
      Throws:
      IOException
      See Also:
    • simplifyOffline

      public static ImmutableGraph simplifyOffline(ImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns a simplified graph(loopless and symmetric) using an offline transposition.

      The simplified graph is the union of a graph and of its transpose, with the loops removed. This method will compute the transpose on the fly using transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger).

      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Returns:
      the simplified (loopless and symmetric) graph.
      Throws:
      IOException
    • processBatch

      public static int processBatch(int n, long[] source, long[] target, File tempDir, List<File> batches) throws IOException
      Sorts the given source and target arrays w.r.t. the target and stores them in a temporary file.
      Parameters:
      n - the index of the last element to be sorted (exclusive).
      source - the source array.
      target - the target array.
      tempDir - a temporary directory where to store the sorted arrays, or null
      batches - a list of files to which the batch file will be added.
      Returns:
      the number of pairs in the batch (might be less than n because duplicates are eliminated).
      Throws:
      IOException
    • transposeOffline

      public static ImmutableSequentialGraph transposeOffline(ImmutableGraph g, int batchSize) throws IOException
      Returns an immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
    • transposeOffline

      public static ImmutableSequentialGraph transposeOffline(ImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns an immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
    • transposeOffline

      public static ImmutableSequentialGraph transposeOffline(ImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns an immutable graph obtained by reversing all arcs in g, using an offline method.

      This method creates a number of sorted batches on disk containing arcs represented by a pair of gap-compressed long integers ordered by target and returns an ImmutableGraph that can be accessed only using a node iterator. The node iterator merges on the fly the batches, providing a transposed graph. The files are marked with File.deleteOnExit(), so they should disappear when the JVM exits. An additional safety-net finaliser tries to delete the batches, too.

      Note that each NodeIterator returned by the transpose requires opening all batches at the same time. The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case, another safety-net finaliser closes all files when the iterator is collected.

      This method can process offline graphs.

      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
    • logBatches

      protected static void logBatches(ObjectArrayList<File> batches, long pairs, ProgressLogger pl)
    • mapOffline

      public static ImmutableSequentialGraph mapOffline(ImmutableGraph g, long[][] map, int batchSize) throws IOException
      Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via a big array.
      Parameters:
      g - an immutable graph.
      map - the transformation map.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      an immutable, sequentially accessible graph obtained by transforming g.
      Throws:
      IOException
      See Also:
    • mapOffline

      public static ImmutableSequentialGraph mapOffline(ImmutableGraph g, long[][] map, int batchSize, File tempDir) throws IOException
      Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via a big array.
      Parameters:
      g - an immutable graph.
      map - the transformation map.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      an immutable, sequentially accessible graph obtained by transforming g.
      Throws:
      IOException
      See Also:
    • mapOffline

      public static ImmutableSequentialGraph mapOffline(ImmutableGraph g, long[][] map, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Remaps the the graph nodes through a partial function specified via a big array, using an offline method.

      More specifically, LongBigArrays.length(map)=g.numNodes(), and LongBigArrays.get(map, i) is the new name of node i, or -1 if the node should not be mapped. If some index appearing in map is larger than or equal to the number of nodes of g, the resulting graph is enlarged correspondingly.

      Arcs are mapped in the obvious way; in other words, there is an arc from LongBigArrays.get(map, i) to LongBigArrays.get(map, j) (both nonnegative) in the transformed graph iff there was an arc from i to j in the original graph.

      Note that if map is bijective, the returned graph is simply a permutation of the original graph. Otherwise, the returned graph is obtained by deleting nodes mapped to -1, quotienting nodes w.r.t. the equivalence relation induced by the fibres of map and renumbering the result, always according to map. See transposeOffline(ImmutableGraph, int, File, ProgressLogger) for implementation and performance-related details.

      Parameters:
      g - an immutable graph.
      map - the transformation map.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Returns:
      an immutable, sequentially accessible graph obtained by transforming g.
      Throws:
      IOException
    • transposeOffline

      public static ArcLabelledImmutableGraph transposeOffline(ArcLabelledImmutableGraph g, int batchSize) throws IOException
      Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
    • transposeOffline

      public static ArcLabelledImmutableGraph transposeOffline(ArcLabelledImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
    • transposeOffline

      public static ArcLabelledImmutableGraph transposeOffline(ArcLabelledImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.

      This method creates a number of sorted batches on disk containing arcs represented by a pair of long integers in DataInput format ordered by target and returns an ImmutableGraph that can be accessed only using a node iterator. The node iterator merges on the fly the batches, providing a transposed graph. The files are marked with File.deleteOnExit(), so they should disappear when the JVM exits. An additional safety-net finaliser tries to delete the batches, too. As far as labels are concerned, they are temporarily stored in an in-memory bit stream, that is permuted when it is stored on the disk

      Note that each NodeIterator returned by the transpose requires opening all batches at the same time. The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case, another safety-net finaliser closes all files when the iterator is collected.

      This method can process offline graphs. Note that no method to transpose on-line arc-labelled graph is provided currently.

      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
    • union

      Returns the union of two arc-labelled immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      labelMergeStrategy - the strategy used to merge labels when the same arc is present in both graphs; if null, Labels.KEEP_FIRST_MERGE_STRATEGY is used.
      Returns:
      the union of the two graphs.
    • union

      public static ImmutableGraph union(ImmutableGraph g0, ImmutableGraph g1)
      Returns the union of two immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      Returns:
      the union of the two graphs.
    • compose

      public static ImmutableGraph compose(ImmutableGraph g0, ImmutableGraph g1)
      Returns the composition (a.k.a. matrix product) of two immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      Returns:
      the composition of the two graphs.
    • compose

      Returns the composition (a.k.a. matrix product) of two arc-labelled immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      strategy - a label semiring.
      Returns:
      the composition of the two graphs.
      Implementation Specification:
      This implementation requires outdegrees smaller than 232.
    • grayCodePermutation

      public static long[][] grayCodePermutation(ImmutableGraph g)
      Returns a permutation that would make the given graph adjacency lists in Gray-code order.

      Gray codes list all sequences of n zeros and ones in such a way that adjacent lists differ by exactly one bit. If we assign to each row of the adjacency matrix of a graph its index as a Gray code, we obtain a permutation that will make similar lines nearer.

      Note that since a graph permutation permutes both rows and columns, this transformation is not idempotent: the Gray-code permutation produced from a matrix that has been Gray-code sorted will not be, in general, the identity.

      The important feature of Gray-code ordering is that it is completely endogenous (e.g., determined by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge of the URL associated to each node).

      Parameters:
      g - an immutable graph.
      Returns:
      the permutation that would order the graph adjacency lists by Gray order (you can just pass it to mapOffline(ImmutableGraph, long[][], int, File, ProgressLogger)).
    • randomPermutation

      public static long[][] randomPermutation(ImmutableGraph g, long seed)
      Returns a random permutation for a given graph.
      Parameters:
      g - an immutable graph.
      seed - for XoRoShiRo128PlusRandom.
      Returns:
      a random permutation for the given graph
    • lexicographicalPermutation

      public static long[][] lexicographicalPermutation(ImmutableGraph g)
      Returns a permutation that would make the given graph adjacency lists in lexicographical order.

      Note that since a graph permutation permutes both rows and columns, this transformation is not idempotent: the lexicographical permutation produced from a matrix that has been lexicographically sorted will not be, in general, the identity.

      The important feature of lexicographical ordering is that it is completely endogenous (e.g., determined by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge of the URL associated to each node).

      Warning: rows are numbered from zero from the left. This means, for instance, that nodes with an arc towards node zero are lexicographically smaller than nodes without it.

      Parameters:
      g - an immutable graph.
      Returns:
      the permutation that would order the graph adjacency lists by lexicographical order (you can just pass it to mapOffline(ImmutableGraph, long[][], int)).
    • main

      Throws:
      IOException
      IllegalArgumentException
      SecurityException
      InstantiationException
      IllegalAccessException
      InvocationTargetException
      NoSuchMethodException
      ClassNotFoundException
      JSAPException