Class Transform

java.lang.Object
it.unimi.dsi.webgraph.Transform

public class Transform
extends Object
Static methods that manipulate immutable graphs.

Most methods take an ImmutableGraph (along with some other data, that depend on the kind of transformation), and return another ImmutableGraph that represents the transformed version.

  • Field Details

    • NO_LOOPS

      public static final it.unimi.dsi.webgraph.Transform.NoLoops NO_LOOPS
      A singleton providing an arc filter that rejects loops.
  • Method Details

    • filterArcs

      public static ImmutableGraph filterArcs​(ImmutableGraph graph, Transform.ArcFilter filter)
      Returns a graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a graph.
      filter - the filter (telling whether each arc should be kept or not).
      Returns:
      the filtered graph.
    • filterArcs

      public static ImmutableGraph filterArcs​(ImmutableGraph graph, Transform.ArcFilter filter, ProgressLogger ignored)
      Returns a graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a graph.
      filter - the filter (telling whether each arc should be kept or not).
      ignored - a progress logger.
      Returns:
      the filtered graph.
    • filterArcs

      Returns a labelled graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a labelled graph.
      filter - the filter (telling whether each arc should be kept or not).
      Returns:
      the filtered graph.
    • filterArcs

      Returns a labelled graph with some arcs eventually stripped, according to the given filter.
      Parameters:
      graph - a labelled graph.
      filter - the filter (telling whether each arc should be kept or not).
      ignored - a progress logger.
      Returns:
      the filtered graph.
    • map

      public static ImmutableGraph map​(ImmutableGraph g, int[] map, ProgressLogger pl)
      Remaps the graph nodes through a partial function specified via an array. More specifically, map.length=g.numNodes(), and map[i] is the new name of node i, or -1 if the node should not be mapped. If some index appearing in map is larger than or equal to the number of nodes of g, the resulting graph is enlarged correspondingly.

      Arcs are mapped in the obvious way; in other words, there is an arc from map[i] to map[j] (both nonnegative) in the transformed graph iff there was an arc from i to j in the original graph.

      Note that if map is bijective, the returned graph is simply a permutation of the original graph. Otherwise, the returned graph is obtained by deleting nodes mapped to -1, quotienting nodes w.r.t. the equivalence relation induced by the fibres of map and renumbering the result, always according to map.

      This method requires ImmutableGraph.randomAccess() random access.

      Parameters:
      g - the graph to be transformed.
      map - the transformation map.
      pl - a progress logger to be used during the precomputation, or null.
      Returns:
      the transformed graph (provides random access.
    • map

      public static ImmutableGraph map​(ImmutableGraph g, int[] f)
      Remaps the graph nodes through a function specified via an array.
      Parameters:
      g - the graph to be transformed.
      f - the transformation map.
      Returns:
      the transformed graph.
      See Also:
      map(ImmutableGraph, int[], ProgressLogger)
    • symmetrizeOffline

      public static ImmutableGraph symmetrizeOffline​(ImmutableGraph g, int batchSize) throws IOException
      Returns a symmetrized graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      the symmetrized graph.
      Throws:
      IOException
      See Also:
      symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
    • symmetrizeOffline

      public static ImmutableGraph symmetrizeOffline​(ImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns a symmetrized graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      the symmetrized graph.
      Throws:
      IOException
      See Also:
      symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
    • symmetrizeOffline

      public static ImmutableGraph symmetrizeOffline​(ImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns a symmetrized graph using an offline transposition.

      The symmetrized graph is the union of a graph and of its transpose. This method will compute the transpose on the fly using transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger).

      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Returns:
      the symmetrized graph.
      Throws:
      IOException
    • simplify

      public static ImmutableGraph simplify​(ImmutableGraph g, ImmutableGraph t, ProgressLogger pl)
      Returns a simplified (loopless and symmetric) graph using the graph and its transpose.
      Parameters:
      g - the source graph.
      t - the graph g transposed.
      pl - a progress logger, or null.
      Returns:
      the simplified (loopless and symmetric) graph.
    • simplify

      public static ImmutableGraph simplify​(ImmutableGraph g, ImmutableGraph t)
      Returns a simplified (loopless and symmetric) graph using the graph and its transpose.
      Parameters:
      g - the source graph.
      t - the graph g transposed.
      Returns:
      the simplified (loopless and symmetric) graph.
    • simplifyOffline

      public static ImmutableGraph simplifyOffline​(ImmutableGraph g, int batchSize) throws IOException
      Returns a simplified (loopless and symmetric) graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      the simplified (loopless and symmetric) graph.
      Throws:
      IOException
      See Also:
      simplifyOffline(ImmutableGraph, int, File, ProgressLogger)
    • simplifyOffline

      public static ImmutableGraph simplifyOffline​(ImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns a simplified (loopless and symmetric) graph using an offline transposition.
      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      the simplified (loopless and symmetric) graph.
      Throws:
      IOException
      See Also:
      simplifyOffline(ImmutableGraph, int, File, ProgressLogger)
    • simplifyOffline

      public static ImmutableGraph simplifyOffline​(ImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns a simplified graph(loopless and symmetric) using an offline transposition.

      The simplified graph is the union of a graph and of its transpose, with the loops removed. This method will compute the transpose on the fly using transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger).

      Parameters:
      g - the source graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Returns:
      the simplified (loopless and symmetric) graph.
      Throws:
      IOException
    • symmetrize

      public static ImmutableGraph symmetrize​(ImmutableGraph g, ImmutableGraph t, ProgressLogger pl)
      Returns a symmetrized graph.

      The symmetrized graph is the union of a graph and of its transpose. This method will use the provided transposed graph, if any, instead of computing it on the fly.

      Parameters:
      g - the source graph.
      t - the graph g transposed; if null, the transposed graph will be computed on the fly.
      pl - a progress logger, or null.
      Returns:
      the symmetrized graph.
    • symmetrize

      public static ImmutableGraph symmetrize​(ImmutableGraph g, ImmutableGraph t)
      Returns a symmetrized graph.

      The symmetrized graph is the union of a graph and of its transpose. This method will use the provided transposed graph, if any, instead of computing it on the fly.

      Parameters:
      g - the source graph.
      t - the graph g transposed; if null, the transposed graph will be computed on the fly.
      Returns:
      the symmetrized graph.
    • symmetrize

      public static ImmutableGraph symmetrize​(ImmutableGraph g, ProgressLogger pl)
      Returns a symmetrized graph.
      Parameters:
      g - the source graph.
      pl - a progress logger.
      Returns:
      the symmetrized graph.
      See Also:
      symmetrize(ImmutableGraph, ImmutableGraph, ProgressLogger)
    • symmetrize

      public static ImmutableGraph symmetrize​(ImmutableGraph g)
      Returns a symmetrized graph.
      Parameters:
      g - the source graph.
      Returns:
      the symmetrized graph.
      See Also:
      symmetrize(ImmutableGraph, ImmutableGraph, ProgressLogger)
    • transpose

      public static ImmutableGraph transpose​(ImmutableGraph g, ProgressLogger pl)
      Returns an immutable graph obtained by reversing all arcs in g.

      This method can process offline graphs).

      Parameters:
      g - an immutable graph.
      pl - a progress logger, or null.
      Returns:
      an immutable graph obtained by transposing g.
    • processBatch

      public static int processBatch​(int n, int[] source, int[] target, File tempDir, List<File> batches) throws IOException
      Sorts the given source and target arrays w.r.t. the target and stores them in a temporary file.
      Parameters:
      n - the index of the last element to be sorted (exclusive).
      source - the source array.
      target - the target array.
      tempDir - a temporary directory where to store the sorted arrays, or null
      batches - a list of files to which the batch file will be added.
      Returns:
      the number of pairs in the batch (might be less than n because duplicates are eliminated).
      Throws:
      IOException
    • transposeOffline

      public static ImmutableSequentialGraph transposeOffline​(ImmutableGraph g, int batchSize) throws IOException
      Returns an immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
      transposeOffline(ImmutableGraph, int, File, ProgressLogger)
    • transposeOffline

      public static ImmutableSequentialGraph transposeOffline​(ImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns an immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
      transposeOffline(ImmutableGraph, int, File, ProgressLogger)
    • transposeOffline

      public static ImmutableSequentialGraph transposeOffline​(ImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns an immutable graph obtained by reversing all arcs in g, using an offline method.

      This method should be used to transpose very large graph in case transpose(ImmutableGraph) requires too much memory. It creates a number of sorted batches on disk containing arcs represented by a pair of gap-compressed integers ordered by target and returns an ImmutableGraph that can be accessed only using a node iterator. The node iterator merges on the fly the batches, providing a transposed graph. The files are marked with File.deleteOnExit(), so they should disappear when the JVM exits. An additional safety-net finaliser tries to delete the batches, too.

      Note that each NodeIterator returned by the transpose requires opening all batches at the same time. The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case, another safety-net finaliser closes all files when the iterator is collected.

      This method can process offline graphs.

      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
    • logBatches

      protected static void logBatches​(ObjectArrayList<File> batches, long pairs, ProgressLogger pl)
    • mapOffline

      public static ImmutableSequentialGraph mapOffline​(ImmutableGraph g, int[] map, int batchSize) throws IOException
      Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via an array.
      Parameters:
      g - an immutable graph.
      map - the transformation map.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      Returns:
      an immutable, sequentially accessible graph obtained by transforming g.
      Throws:
      IOException
      See Also:
      mapOffline(ImmutableGraph, int[], int, File, ProgressLogger)
    • mapOffline

      public static ImmutableSequentialGraph mapOffline​(ImmutableGraph g, int[] map, int batchSize, File tempDir) throws IOException
      Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via an array.
      Parameters:
      g - an immutable graph.
      map - the transformation map.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      an immutable, sequentially accessible graph obtained by transforming g.
      Throws:
      IOException
      See Also:
      mapOffline(ImmutableGraph, int[], int, File, ProgressLogger)
    • mapOffline

      public static ImmutableSequentialGraph mapOffline​(ImmutableGraph g, int[] map, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via an array. See map(ImmutableGraph, int[], ProgressLogger) for the semantics of this method and transpose(ImmutableGraph, ProgressLogger) for implementation and performance-related details.
      Parameters:
      g - an immutable graph.
      map - the transformation map.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger, or null.
      Returns:
      an immutable, sequentially accessible graph obtained by transforming g.
      Throws:
      IOException
    • transposeOffline

      public static ArcLabelledImmutableGraph transposeOffline​(ArcLabelledImmutableGraph g, int batchSize) throws IOException
      Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
      transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
    • transposeOffline

      public static ArcLabelledImmutableGraph transposeOffline​(ArcLabelledImmutableGraph g, int batchSize, File tempDir) throws IOException
      Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.
      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
      See Also:
      transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
    • transposeOffline

      public static ArcLabelledImmutableGraph transposeOffline​(ArcLabelledImmutableGraph g, int batchSize, File tempDir, ProgressLogger pl) throws IOException
      Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.

      This method should be used to transpose very large graph in case transpose(ImmutableGraph) requires too much memory. It creates a number of sorted batches on disk containing arcs represented by a pair of integers in DataInput format ordered by target and returns an ImmutableGraph that can be accessed only using a node iterator. The node iterator merges on the fly the batches, providing a transposed graph. The files are marked with File.deleteOnExit(), so they should disappear when the JVM exits. An additional safety-net finaliser tries to delete the batches, too. As far as labels are concerned, they are temporarily stored in an in-memory bit stream, that is permuted when it is stored on the disk

      Note that each NodeIterator returned by the transpose requires opening all batches at the same time. The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case, another safety-net finaliser closes all files when the iterator is collected.

      This method can process offline graphs. Note that no method to transpose on-line arc-labelled graph is provided currently.

      Parameters:
      g - an immutable graph.
      batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
      tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
      pl - a progress logger.
      Returns:
      an immutable, sequentially accessible graph obtained by transposing g.
      Throws:
      IOException
    • transpose

      public static ImmutableGraph transpose​(ImmutableGraph g)
      Returns an immutable graph obtained by reversing all arcs in g.

      This method can process offline graphs.

      Parameters:
      g - an immutable graph.
      Returns:
      an immutable graph obtained by transposing g.
      See Also:
      transpose(ImmutableGraph, ProgressLogger)
    • union

      Returns the union of two arc-labelled immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      labelMergeStrategy - the strategy used to merge labels when the same arc is present in both graphs; if null, Labels.KEEP_FIRST_MERGE_STRATEGY is used.
      Returns:
      the union of the two graphs.
    • union

      public static ImmutableGraph union​(ImmutableGraph g0, ImmutableGraph g1)
      Returns the union of two immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      Returns:
      the union of the two graphs.
    • compose

      public static ImmutableGraph compose​(ImmutableGraph g0, ImmutableGraph g1)
      Returns the composition (a.k.a. matrix product) of two immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      Returns:
      the composition of the two graphs.
    • compose

      Returns the composition (a.k.a. matrix product) of two arc-labelled immutable graphs.

      The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

      Parameters:
      g0 - the first graph.
      g1 - the second graph.
      strategy - a label semiring.
      Returns:
      the composition of the two graphs.
    • line

      public static ImmutableSequentialGraph line​(ImmutableGraph g, String mapBasename, File tempDir, int batchSize, ProgressLogger pl) throws IOException
      Computes the line graph of a given symmetric graph. The line graph of g is a graph, whose nodes are identified with pairs of the form <xy> where x and y are nodes of g and <xy> is an arc of g. Moreover, there is an arc from <xy> to <yz>.

      Two additional files are created, with names stemmed from mapBasename; the i-th entries of the two files identify the source and target node (in the original graph) corresponding the node i in the line graph.

      Parameters:
      g - the graph (it must be symmetric and loopless).
      mapBasename - the basename of two files that will, at the end, contain as many integers as the number of nodes in the line graph: the i-th integer in the file mapBasename.source will contain the source of the arc corresponding to the i-th node in the line graph, and similarly mapBasename.target will give the target.
      tempDir - the temporary directory to be used.
      batchSize - the size used for batches.
      pl - the progress logger to be used.
      Returns:
      the line graph of g.
      Throws:
      IOException
    • grayCodePermutation

      public static int[] grayCodePermutation​(ImmutableGraph g)
      Returns a permutation that would make the given graph adjacency lists in Gray-code order.

      Gray codes list all sequences of n zeros and ones in such a way that adjacent lists differ by exactly one bit. If we assign to each row of the adjacency matrix of a graph its index as a Gray code, we obtain a permutation that will make similar lines nearer.

      Note that since a graph permutation permutes both rows and columns, this transformation is not idempotent: the Gray-code permutation produced from a matrix that has been Gray-code sorted will not be, in general, the identity.

      The important feature of Gray-code ordering is that it is completely endogenous (e.g., determined by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge of the URL associated to each node).

      Parameters:
      g - an immutable graph.
      Returns:
      the permutation that would order the graph adjacency lists by Gray order (you can just pass it to map(ImmutableGraph, int[], ProgressLogger)).
    • randomPermutation

      public static int[] randomPermutation​(ImmutableGraph g, long seed)
      Returns a random permutation for a given graph.
      Parameters:
      g - an immutable graph.
      seed - for XoRoShiRo128PlusRandom.
      Returns:
      a random permutation for the given graph
    • hostByHostGrayCodePermutation

      public static int[] hostByHostGrayCodePermutation​(ImmutableGraph g, int[] hostMap, boolean strict)
      Returns a permutation that would make the given graph adjacency lists in host-by-host Gray-code order.

      This permutation differs from grayCodePermutation(ImmutableGraph) in that Gray codes are computed host by host. There are two variants, strict and loose. In the first case, we restrict the adjacency matrix to the submatrix corresponding to a host and compute the ordering. In the second case, we just restrict to the rows corresponding to a host, but then entire rows are used to compute the ordering.

      Parameters:
      g - an immutable graph.
      hostMap - an array mapping each URL to its host (it is sufficient that each host is assigned a distinct number).
      strict - if true, host-by-host Gray code computation will be strict, that is, the order is computed only between columns of the same host of the rows.
      Returns:
      the permutation that would order the graph adjacency lists by host-by-host Gray order (you can just pass it to map(ImmutableGraph, int[], ProgressLogger)).
    • lexicographicalPermutation

      public static int[] lexicographicalPermutation​(ImmutableGraph g)
      Returns a permutation that would make the given graph adjacency lists in lexicographical order.

      Note that since a graph permutation permutes both rows and columns, this transformation is not idempotent: the lexicographical permutation produced from a matrix that has been lexicographically sorted will not be, in general, the identity.

      The important feature of lexicographical ordering is that it is completely endogenous (e.g., determined by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge of the URL associated to each node).

      Warning: rows are numbered from zero from the left. This means, for instance, that nodes with an arc towards node zero are lexicographically smaller than nodes without it.

      Parameters:
      g - an immutable graph.
      Returns:
      the permutation that would order the graph adjacency lists by lexicographical order (you can just pass it to map(ImmutableGraph, int[], ProgressLogger)).
    • ensureNumArgs

      protected static boolean ensureNumArgs​(String[] param, int n)
      Ensures that the arguments are exactly n, if n is nonnegative, or at least -n, otherwise.
    • load

      Loads a graph with given data and returns it.
      Parameters:
      graphClass - the class of the graph to be loaded.
      baseName - the graph basename.
      offline - whether the graph is to be loaded in an offline fashion.
      pl - a progress logger.
      Returns:
      the loaded graph.
      Throws:
      IllegalArgumentException
      SecurityException
      IllegalAccessException
      InvocationTargetException
      NoSuchMethodException
      IOException
    • main

      Throws:
      IOException
      IllegalArgumentException
      SecurityException
      InstantiationException
      IllegalAccessException
      InvocationTargetException
      NoSuchMethodException
      ClassNotFoundException
      JSAPException