SumVis is an interactive summary and visualization tool for large-scale graphs. It was completed in 2016 as part of my computer science undergraduate honors research thesis, advised by Professor Christos Faloutsos.
Given a very large undirected graph (on the order of thousands or millions of edges), how can we break it up into a small number of important aspects to display to a user? And how can we determine what to show or hide in our visualization? Our proposed solution is SumVis, an interactive graph visualization tool that summarizes a large graph by visualizing it in a clear and succinct manner. The tool allows users to interact with a condensed form of an input graph and uses ‘glyphs’ to represent the graph’s constituent subgraphs. Spy plot and node-link diagram views of the input graph are additionally integrated with the visualization to provide context for each glyph and where they are situated relative to the full data.
Finally, to set SumVis apart from existing graph visualization tools, which become illegible on graphs containing more than a few hundred edges, our implementation displays no more than 5 glyphs onscreen at a time.
SumVis takes an undirected graph represented by a comma-separated file of (source, destination, edge weight) tuples and passes it to VoG, a scalable graph summary algorithm that extracts a set of (possibly overlapping) subgraphs from an input graph, for analysis. Once VoG has identified all the substructures in the graph the output is then processed and visualized in Processing.
The original graph's constituent subgraphs (such as cliques, stars, and chains) are represented as 'glyphs'. If two subgraphs had any nodes in common, their glyphs would be connected by "regions"--which are edges with thicker line weights and lower opacities. Spy plot and node-link diagram views are used to supplement the visualization; the views can be switched using the ‘Show Hairball’/‘Show Spy Plot’ buttons. Selecting a glyph highlights its location on the spy plot and node-link diagram.
When the user has a glyph selected, they may choose to roll over another glyph to compare their points on the spy plot. The spy plot points for the first glyph are highlighted in red while the points for the second glyph are highlighted in blue. If the two glyphs share common edges, the corresponding spy plot points would be highlighted in purple.
SumVis was used to visualize the degeneracy-cores of real large-scale network datasets obtained from the Stanford Network Analysis Platform (SNAP). Findings were reported on an Internet traffic network and an email communication network.
It was found that the Internet traffic network consisted mainly of stars and full-cliques with many overlapping nodes and occasionally overlapping edges.
It was found that the vast majority of structures in the email communication network were stars. Some small chains, small near-bipartite cores, and one large full-clique were also found. The small chains in the graph are particularly interesting as they suggest the distribution of chain letters, which could be an indication of fradulent activity.