In Fig. The index consists of border nodes and POIs. Thus, the number of index nodes is at most the number of nodes in the temporal graph. We introduce three types of edges into the index. B B edges connect border nodes between different cells, and they are a subset of the temporal graph edges. B C edges connect border nodes in a single cell, and their cardinality is at most quadratic in the number of border nodes.
The numbers of B C and B P edges depend only on the subset of temporal graph nodes that are in a single cell. The numbers do not depend on the graph size. In sparse graphs, where many nodes have only a few edges, the reachability index may grow larger than the temporal graph: we can remove only a small number of original edges but need to insert new B C and B P edges. Each edge has as many edge cost values as there are departure times from a node.
The edge costs are computed for each single cell in isolation, making parallel computation possible. In particular, the edge cost of a specific border node at a specific departure time is independent of all other edge costs.
The index size, as well as the size of the temporal graph, is dominated by the size of the schedule, i. After computing the edge costs in the index, we observe that many different departure times have the same arrival time at the destination.
It is enough to keep only one connection per arrival time, namely the one with the maximum departure time. We leverage that and compact the index by reducing the number of connections as follows. Section 7. The core idea of our reachability algorithm is to expand cell by cell rather than edge by edge.
The B B edges between border nodes of different cells allow us to expand to the neighboring cells; the B C edges between border nodes of the same cell reflect the time to cross a cell; the direct B P edges from border nodes to POIs allow for a quick evaluation of which POIs can be reached. In addition, we discuss a heuristic to avoid unnecessary edge expansions and processing of query nodes that are non-border nodes. The closest node v to q is popped from the min-heap line 4 , and the costs for nodes adjacent to v are updated if smaller lines 9— To retrieve the correct edge cost, we do a binary search in the list of edge costs sorted by departure time line 9.
Each node is traversed only once. The algorithm terminates when no more nodes with cost lower than the budget are in the heap line 5. Consider the reachability index in Fig. Regarding the edges within a cell, we observe the following. Consider Algorithm 1 processing a border node b of a cell C i. When we pop a node v j in a later round, and if v j was last updated by b , there is no point in following the edges from v j to the other nodes in the cell. The cost of accessing the other nodes in the cell through v j cannot be smaller than the cost of accessing these nodes directly from b since all edge costs are shortest paths.
If, however, v j was updated through an edge from a neighboring cell, the edges to the other nodes in the cell need to be followed. We exploit this observation to avoid following edges inside a cell that cannot lead to an update and thus do not affect the solution. We flag the nodes whenever their cost was updated by processing a node from within a cell, and we remove the flag, otherwise line The outgoing edges that must be expanded are selected based on the flag line 7.
Note that the number of edges within a cell is quadratic in the number of border nodes of that cell. Thanks to the use of flags we avoid unnecessary expansions. The value of k is expected to be small and will often be 1 i. The reachability index does not contain all nodes of the original graph. If the query node q in cell C i is not a border node, the algorithm starts the expansion from q in the temporal graph.
All POIs reached in cell C i are part of the result. We show that the shortest-path costs in the index and the original temporal graph are identical. Otherwise, u and w are not in the same cell all nodes in a cell are connected with an edge. We show that the cost of the index path is indeed the shortest path. On a path of length two, the costs of edges u , u 1 and u 1 , w are precomputed shortest-path costs, and they are therefore correct.
The assumption, however, implies that one of the edge costs could be decreased, i. This argument can be extended edge by edge to paths of arbitrary length.
Shortest-path and reachability queries on road networks, i. Unfortunately, these works cannot be applied readily to public transport networks Bast, An evaluation by Bast et al.
Bast et al. This is due to the time-dependent edge costs of public transport networks, which makes the precomputation efforts of many algorithms infeasible. Dijkstra-based approaches include isochrone algorithms for multimodal networks Bauer et al. Since all edges in the isochrone must be expanded, this approach does not scale to large networks. Many works fall into the category of labeling approaches. The earliest work, 2-hop labeling Cohen et al.
Recent works strive to decrease the index size and construction time Cheng et al. In TTL, the main idea is to precompute label sets for each node v containing reachable nodes from and to v. Top Chain creates a directed acyclic graph DAG , where each node represents a departure time, and decomposes the DAG to create the label sets.
Creating label sets in both techniques requires high precomputation costs and large index sizes. To decrease the index size, Top Chain only stores K label sets, called chains. The index size of Top Chain for small K values is smaller than that of TTL, but there is no guarantee that the query results can be found using the index. Non-labeling techniques include Scalable Transfer Patterns Bast et al. Transfer Patterns require an expensive profile search from each node to find the optimal paths to all other nodes.
CSA organizes a schedule as two sequences of edges. The first sequence contains sorted edges based on arrival times, and the second sequence sorts edges based on departure times. At query time, CSA scans the sequences in linear time to answer earliest arrival path queries.
CHT organizes vertices in hierarchies and applies a contraction technique to reduce the graph size for query processing. SPs are precomputed by adding new edges to the graph, which are leveraged at query time.
These approaches involve expensive precomputations or large index sizes, which limits their scalability. To compute reachability queries as defined in this paper, all techniques based on point-to-point queries require the computation of shortest paths from a given query node to every POI, which does not scale to large number of POIs. We experimentally evaluate our solution, RQ , and compare it to two competitors, a no-index solution, NI , and a fully-indexed solution, SP.
We report on the index size and efficiency of the algorithms w. Our algorithm, RQ , partitions the input graph in order to build the index. We study the effect of different partitioning techniques discussed in Section 4. We identify one partitioning technique to be used in conjunction with RQ. For the partitioning of the input graphs we use the following Python libraries: for Louvain python-louvain v0.
The no-index solution, NI , operates on the original temporal graph and does not build an index. Section 5. The fully-indexed solution, SP , precomputes and stores all shortest paths from every node in the temporal graph to each POI at every departure time. SP represents the collection of works that index the shortest paths between pairs of nodes cf. Section 6. In terms of lookups per shortest-path, SP is optimal since only a single lookup is required per shortest-path query.
Other solutions for reachability queries based on indices for shortest-path queries cannot outperform SP in terms of lookups per query. Instead, these solutions trade in lookup performance to achieve a smaller memory footprint which is high for SP. Therefore, the number of edge expansions performed by SP for reachability queries cannot improve by substituting SP by another shortest-path index.
We use two real-world public transport networks represented as temporal graphs, Zurich and Berlin , and one synthetic graph, Synthetic. For these graphs, we chose all transport modes and all connections operating on Mondays.
Each spider-web subgraph has one edge to every neighboring subgraph to its left, right, top, and bottom. This graph simulates loosely connected cities that are densely connected inside. In such a case we would expect a good partitioning to assign nodes of each spider-web subgraph into a separate partition. Table 1 shows the statistics, where Conn is the number of all connections departure-arrival pairs that can be used to cross an edge. Figure 3 visualizes the structure of our public transport networks.
Our solution, RQ , will work with any partitioning of the input graph and compute correct results. However, we observe that the index structure and performance of query answering varies depending on the specific partitioning used to build the index cf. Section 4. We investigate the effect of different partitioning techniques on our solution RQ. In our analysis, we include two community detection algorithms, Louvain Blondel et al.
In Louvain and Leiden, the so-called modularity of the partitioning is optimized to find good communities. To compute the modularity, a resolution and a weight between pairs of nodes needs to be specified. We use the default value 1 for the resolution and the number of connections as the edge weight: the more connections exist between two nodes, the better they are connected.
Defining a good number of partitions is not straightforward since this parameter inherently depends on the network structure. We evaluate METIS using the mean of the partition numbers detected by Louvain and Leiden when we compare to these algorithms in terms of index size and query performance. We visualize the partitions for Zurich, Berlin and Synthetic graphs in Figs. Louvain and Leiden auto-detect the number of partitions. METIS requires the number of partitions as an input parameter.
The numbers in Table 2 show that METIS correctly identifies all spider webs when the number of partitions is well chosen, resulting in a low number of border nodes. With the number of partitions that we automatically detect using Louvain and Leiden 44 partitions , however, the performance of METIS significantly drops.
The visualization of the partitions in Fig. One of the optimization goals of METIS is to produce partitions well balanced in size, whereas Louvain and Leiden detect communities regardless of their size. This is confirmed by our results. The partitions of METIS are well balanced, while the community detection algorithms produce some very small and some very large partitions for the real world datasets. Figure 7 shows the distribution of nodes per partition for the three algorithms on Zurich, Berlin and Synthetic.
As discussed in Section 4. A low number of border nodes indicates loose connections between partitions since edges between partitions can exist only between border nodes. For Zurich and Berlin, Leiden exhibits the lowest values for border nodes. We observe that for these datasets the good balance of partition sizes of METIS comes at the cost of more border nodes.
For the Synthetic graph, if METIS is given the optimal number of 36 partitions, all spider webs are detected and optimal partitions are produced, resulting in two border nodes for the spider webs in the four corners, three border nodes for the spider webs on the boundaries of the graph, and four border nodes for all other spider webs. The reason is that METIS strives to balance the partition sizes also at the cost of more border nodes and border edges. Zurich and Berlin include small disconnected components.
Louvain and Leiden detect the small disconnected components and do not further partition them. This explains the zero values for the minimum number of border nodes in Table 2. METIS includes the disconnected components into larger partitions to balance their sizes.
The border nodes resulting from a partitioning define the structure of our RQ index. Table 3 presents the index core size for different partitionings.
The values that increase the index size are the number of nodes Nodes and edges Edges , and the number of connections Conn , which is the dominating factor. The total sum of these three values Total is used to compare index sizes among the different partitionings.
Table 3 suggests that the most promising partitioning is Leiden, which results in the smallest overall index core size Total. An exception is Synthetic if we provide METIS with the optimal number of 36 partitions: the resulting index core is much smaller than the index core for the other partitionings.
In Zurich and Berlin, the average number of nodes and edges using Leiden is the smallest. The number of nodes varies across the samples due to a varying number of POIs that are border nodes. To measure the effect of different partitioning techniques on the querying performance, we count the number of edges that RQ must expand for answering a reachability query. All query nodes are border nodes. Since the different partitionings produce different border nodes, we pick all border nodes that exist in all partitionings as query nodes.
For each query node, we executed a total of 10 different reachability queries varying the starting time , , , , and time budget 60 and minutes. The results for our three input graphs are shown in Fig. The data points are ordered along the x-axis by the number of expanded edges, separately for each data series. The number k of partitions for METIS is different for each dataset to approximately match the number of partitions that Louvain and Leiden produce for the respective setting cf.
Table 2. In Zurich and Synthetic, all the three partitioning techniques perfom similarly for most of the data points. Leiden outperforms Louvain with respect to the index size. Given a similar number of partitions as Leiden and Louvain, 1 METIS produces more balanced partition sizes, but the partitions have more border nodes on average, and also 2 the index size in particular the number of connections that dominates the index size is smaller for Leiden than for METIS.
In terms of query performance, METIS is competitive with the other approaches and in some cases slightly outperforms them. The catch is that the performance of METIS depends on the parameter k , which is hard to choose without running experiments similar to the ones in our tests. The parameters k so determined may not be valid for other networks or query loads. We use RQ with Leiden partitioning as we suggest in Section 7. We compare the index size of RQ with Leiden partitioning to the index sizes of its competitors.
Although NI does not require precomputation, the input graph has to be kept in memory. RQ and SP precompute certain shortest paths and build an index structure that is sufficient to answer reachability queries.
For RQ the index size depends on the position of the border nodes. The index size of RQ is always smaller than that of SP up to four orders of magnitude in the Synthetic graph.
Although RQ has significantly fewer nodes and edges than the original graph for Berlin , the number of connections is higher. This is caused by the sparsity of Berlin cf.
Finally, Connections is the number of edge connections stored. For RQ , we list the absolute number of connections after the compaction cf. Table 6 shows the runtime to build our RQ index. The shortest path computations are executed in parallel on 20 cores. All runtimes for Berlin are below 30 min. The high construction time for METIS with 44 partitions is due to the poor partitioning and the resulting high number of connections that require many shortest path computations before compaction, there are ,, connections, each resulting from one shortest path computation.
To evaluate the efficiency, we compare the number of edges that each algorithm has to process in order to find all reachable POIs Fig. Finally, we compare a sample set of well-known empirical networks neural, social and transportation. We find that, while most of these systems display a pathlength comparable to that of random graphs, only cortical connectomes reveal to be US when contrasted to the boundaries.
In this paper we will refer to small-world networks in the classical sense, where the typical distance between nodes is much smaller than the size of the network.
While our focus is on the properties of the pathlength, the clustering coefficient of the graphs at the boundaries is shown in Supplementary Note 6. We will denote the properties of directed graphs adding a tilde to the symbols. We identify these boundaries as families of di graph configurations which we name US and UL networks, summarised in Fig.
These families arise from a few simple building blocks, Fig. The sparsest connected graphs that can be constructed are named trees, i. Among trees, star and path graphs are the configurations with the shortest and the longest pathlength, respectively. In a star graph, any two nodes can reach each other jumping through the central hub while in a path graph, the whole network needs to be traversed to travel from one end to the other.
Construction of ultra-short and ultra-long di graphs. Star graphs, path graphs, directed rings and complete graphs serve as the starting references to construct di graphs of arbitrary density with extremal average pathlength. Edge colour denotes the order of edge addition. Red edges are the last added and green links the ones in the previous steps.
These cases are often non-Markovian and reveal novel structures. We name these flower digraphs. Finally, h digraphs with smallest efficiency are achieved by constructing the densest directed acyclic graphs possible, to minimise the contribution of cycles to the path structure of the network.
In the case of digraphs, both US and UL configurations are obtained by adding arcs to a directed ring, Figs. The precise order of link addition differs from case to case. Two findings deserve a special mention. For example, Fig. When studying large networks it is common to find that these are sparse and fragmented into many components.
While the pathlength in these cases is infinite, these networks can still be characterised by their efficiency, which remains a finite quantity allowing to zoom into the sparse regime. Thus the contribution of disconnected pairs with infinite distance is null. In the case of graphs, Fig. See Fig. In the following we illustrate how knowledge of the US and UL boundaries frame the space of lengths that networks take, both empirical and models.
The results are shown in Fig. Shaded areas mark the values of pathlength and global efficiency that no network can achieve. Solid lines represent the ranges in which the models are connected and dashed lines correspond to the efficiency when the networks are disconnected.
The location of the original building-blocks star graphs, path graphs, directed rings and complete graphs are also shown over the maps for reference. Pathlength and efficiency of characteristic network models. Shaded areas mark values of pathlength and efficiency that no di graph of the same size can achieve. The efficiency of random and scale-free graphs undergoes a transition from ultra-long to ultra-short centered at their percolation thresholds.
In this case, the two boundaries emerge from the same point corresponding to a directed ring open circle. Curves for random and scale-free networks are averages over realisations. Dashed lines represent ranges of density for which the models are disconnected and solid lines represent di graphs which are connected.
The pathlength of random, scale-free and ring networks decays with density, as expected, with the three eventually converging onto the lower boundary and becoming US, Fig. But, the decay rates differ across models. Scale-free networks are always shorter than random graphs in the sparser regime, where the length of both models is well above the lower boundary. Figure 2 c, d reproduces the same results in terms of efficiency. An advantage of efficiency is that it always takes a finite value, from zero to one, regardless of whether a network is connected or not.
The reason for this is that SF graphs percolate earlier than random graphs The results for the directed versions of the random and scale-free networks, Fig. We now elucidate how the knowledge of the boundaries allows us to interpret the length of real networks faithfully.
First, we will illustrate how different references may give rise to subjective interpretations. Then, we will show how the boundaries help framing both empirical and model networks into a unified representation. Considering the US and UL boundaries, one could also propose the following re-scaling.
At this point, it shall be stressed that the use of relative metrics may lead to biased interpretations, depending on the question s we are asking about the data. Thus, they are meant to answer different questions. On the one hand, null-models are constructed upon particular sets of constraints and following generative assumptions on the rules governing how nodes link with each other. The role of expectation values is thus to test whether those hypotheses explain the pathlength observed in the real networks.
On the other hand, limits correspond to the extremal values di graphs could take and are independent of generative assumptions. For practical illustration, we study a set of empirical datasets from three different domains: neural and cortical connectomes, social networks and transportation systems, see Table 1.
First, we show that the ranking of these networks with respect to pathlength is very much altered depending on the relative reference chosen, Fig. According to the absolute pathlength, Fig. The network of prison inmates is directed and weakly connected, therefore it has an infinite pathlength. See Supplementary Note 5 for the results presented in terms of global efficiency. We could now ask whether these observations are a trivial consequence of the different sizes of those networks.
In this case, the ranking changes considerably. Comparison of absolute and relative pathlength for selected neural, social and transportation networks. Red crosses indicated cases for which all random graphs generated as benchmark were disconnected and had thus an infinite pathlength. The Prison social network is weakly connected and can thus only be studied by characterising efficiency, see Supplementary Note 5.
When considering random graphs as the reference, Fig. With these results at hand we may tend to believe that, e. If the US limit is taken as reference, a different scenario is found, Fig. The dolphins and the facebook circles are twice as long as the lower boundary and the transportation networks deviate even further.
The comparison to random graphs was not possible for three transportation networks London transportation, Chicago transportation and the U.
Calculated in terms of global efficiency, this comparison shows that these three networks are less efficient than random graphs and lie notably far from the optimal, see Supplementary Note 5.
Notice that transportation networks, in reality, are embedded in space and are nearly planar. The planarity helps sparse networks to be connected 15 , while they would easily be disconnected if this constraint were ignored. Figure 3 e shows the two-point relative pathlengths when both random graphs and ring-lattices are taken as references.
So far, we have evidenced that the use of relative metrics to rank networks according to their length may lead to biased interpretations because the outcome depends on whether the reference is a null-model or a limit. Therefore, a more informative solution than relying on relative metrics is to display all the relevant results, both for real networks and for null-models, into a unified representation.
This representation elucidates the results reported in Fig. First, the space of accessible efficiencies delimited in between the lower and the upper boundaries differs from case to case, depending on the size and density of each network.
Second, the position that both real networks and null-models take within this space is revealed. In the case of the three brain connectomes cat, macaque and human their equivalent random graphs match the US boundary. Thus comparing these networks to random graphs is the same as comparing them to the lower limit. However, in sparser networks this is no longer the case. For example, the efficiency of the neural network of the Caenorhabditis elegans is close to that of random graphs, but both values depart from the US boundary.
In this case, the network is far from ring lattices red lines and from the UL boundary. The opposite is found for transportation networks: their global efficiency and the efficiency of corresponding random graphs lie closer to the UL boundary than to the US. Comparison of global efficiency for selected neural, social and transportation networks. The span of the boundaries very much differs from case to case because of the different sizes and densities of the networks studied.
For the denser networks e. On the contrary, for the sparser networks e. In summary, the synoptic representation in Fig. But 2 the position networks take with respect to the limits very much differs from case to case, exposing how close each network—real and models—lie from the optimal efficiency or from the worst-case scenario.
In future practical studies, Fig. The expectation value calculated for each null-model will add one datapoint per network and thus allow to visualise the contribution of every set of constraints. Here, we restricted to random graphs and ring lattices for the illustrative purposes.
Among the many descriptors to characterise complex networks, the average pathlength is a very important one. It lies at the heart of the small-world phenomenon and also plays a crucial role in network dynamics, as short pathlength facilitate global synchrony 4 , 16 or the diffusion of information and diseases 17 , Unfortunately, the pathlength is also difficult to treat mathematically and most analytic results so far are restricted to statistical approximations on scale-free and random graphs, at the thermodynamic limit 19 , Here, we have taken a considerable step forward by identifying and formally calculating the limits for the average pathlength and for the global efficiency in networks of any size and density.
We provide results for both directed and undirected networks, whether they are sparse disconnected or dense connected , thus delivering solutions that are useful for the whole range of real networks and di graph models studied in practice.
We have found that these boundaries are given by specific architectures which we generically named US and UL networks. The optimal configurations are not always unique and may vary according to size or density. US and UL networks are thus characterised by a collection of models as summarised in Fig. We have studied empirical networks from three scientific domains—neural, social and transportation.
The comparison evidences that cortical connectomes are the shortest of the three classes. In fact, they are practically as short as they could possibly be and any alteration of their structure, e. Over the last decade it has been discovered that brain and neural connectomes are organised into modular architectures with the cross-modular paths centralised through a rich-club 21 , 22 , 23 , 24 , Recently, it has been shown that this type of organisation supports complex network dynamics as compared to the capabilities of other hierarchical architectures 26 , Now, we find that cortical connectomes are also quasi-optimal in terms of pathlength.
On the other extreme, transportation networks are more than five times longer than the corresponding lower limit. This contrast between cortical and transportation networks is rather intriguing since both are spatially embedded. While the aim of neural networks might be the rapid and efficient access to information within the network, transportation networks are planar and developed to service vast areas surrounding a city.
Thus they are often characterised by long chains spreading out radially from a rather compact centre. From a practical point of view, our theoretical findings solve the problem of assessing and comparing how short or how long a network is. Evaluating the length of a network with a single number—whether absolute or relative—has strong limitations and often involves arbitrary choices.
The choice of the null-model depends on the particular question we may be asking about the data. A more telling approach is then to display all the results both the empirical observations and the expectation values arising from different null-models, e.
Figure 4 offers a synoptic way to assess the position every network real or model takes in the space of global efficiencies, and thereby discloses the relations needed to interpret the results On the other hand, the description of the full distribution still depends on the choice of a generating mechanism, e. Regarding the likelihood of the US and the UL di graph families, for now we can only affirm that the UL limit is a very unlikely outcome in the case of undirected graphs Fig.
The aim of the present paper was the study of the limits for average pathlength and efficiency. Mass … Expand. View 1 excerpt, cites methods. Flux through a Markov chain. Mixing of random walks and other diffusions on a graph.
Abstract: About a decade ago, biophysicists observed an approximately linear relationship between the combinatorial complexity of knotted DNA and the distance traveled in gel electrophoresis … Expand. Matrix analysis. View 2 excerpts, references background. Space syntax and spatial cognition: or why the axial line? View 1 excerpt. Space Syntax And Spatial Cognition.
Space syntax research has found that spatial configuration alone explains a substantial proportion of the variance between aggregate human movement rates in different locations in both urban and … Expand. Space is the machine: A configurational theory of architecture.
Minimal volume entropy for graphs. Among the normalized metrics on a graph, we show the existence and the uniqueness of an entropy-minimizing metric, and give explicit formulas for the minimal volume entropy and the metric realizing … Expand.
View 1 excerpt, references background. Space Syntax and the Dutch City. Space syntax, as developed at the Bartlett, University College London, proposes a fundamental relationship between the configuration of space in a city and the way that it functions. The analysis of … Expand. The Social Logic of Space. Preface Introduction 1.
0コメント