- GraphML (.graphml),
- GraphViz (.dot),
- Adjacency matrix (.net, .txt)
- Pajek-like (.net),
- UCINET's Data Language (.dl)
Each GraphML document is written in a special form of XML and defines a graph. For instance the code below, contains 11 nodes and 12 edges:
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="undirected">
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
<node id="n4"/>
<node id="n5"/>
<node id="n6"/>
<node id="n7"/>
<node id="n8"/>
<node id="n9"/>
<node id="n10"/>
<edge source="n0" target="n2"/>
<edge source="n1" target="n2"/>
<edge source="n2" target="n3"/>
<edge source="n3" target="n5"/>
<edge source="n3" target="n4"/>
<edge source="n4" target="n6"/>
<edge source="n6" target="n5"/>
<edge source="n5" target="n7"/>
<edge source="n6" target="n8"/>
<edge source="n8" target="n7"/>
<edge source="n8" target="n9"/>
<edge source="n8" target="n10"/>
</graph>
</graphml>
All GraphML files consist of a graphml element and a variety of subelements: graph, node, edge, keys. SocNetV understands all of them.
Nodes are defined by the <node id="n1" /> where id is a unique node identification string. This id is used in edge declaration, below.
Edges are defined by the <edge source="n1" target="n1" /> where source and target are equal to existing node ids.
This is the file format of the graphviz layout package. Unfortunately, I have not yet managed to implement the whole specifications of this nice format. The features that are recognized by SocNetV are displayed in the following example:
digraph mydot {
node [color=red, shape=box];
a -> b -> c ->d
node [color=pink, shape=circle];
d->e->a->f->j->k->l->o
[weight=1, color=black];
}
Nodes are defined by the "node" declaration. In this you can define the color and the shape of the nodes that will follow. Each link is denoted by an "->" for directed graphs (digraphs) and a "-" for undirected graphs (graphs) between nodes' labels. For instance, "a -> b" means a directed edge from a to b. Moreover, links can have weights and colours.
The adjacency sociomatrix format is a very easy one.
It describes one-mode networks and contains a simple matrix NxN, where N is the amount of nodes. Each (i,j) element is a number.
If (i,j)=0 then nodes i and j are not connected.
If (i,j)=x where x a non-zero number then there will be an arc from node i to node j.
Again, negative weights are allowed. Those are depicted as dashed lines when the network is visualised on the canvas.
This is an example of an adjacency sociomatrix formatted network.
0000000011 0000101100 0001100000 0010000010 0110001000 0000001100 0100110001 0100010000 1001000000 1000001000
Unlike one-mode networks which describe direct links between actors of the same type, networks can be two-mode as well. Two-mode networks describe either two sets of actors or a set of actors and a set of associated events.
In the first case, which usually is called dyadic two-mode network, there are two sets of actors. The sociomatrix codifies the relations between actors in the first set and actors in the second set.
In the second case, which usually is called affiliation network, there is a set of actors and a set of events or organizations. The sociomatrix measures the attendance or affiliations of the actors (first mode) with a particular event or organization (second mode).
Two-mode networks are described by affiliation network matrices, where A(i,j) codes the events/organizations each actor is affiliated.
A two-mode sociomatrix is a matrix NxM, where N is the amount of nodes and M is the amount of events. Each (i,j) element can be 0 or 1.
If A(i,j)=1 then actor i is affiliated with event j.
This is an example of an two-mode sociomatrix formatted network.
0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0 0 1
Note the 'Pajek-like' part. This is because real Pajek files can be much more complicate than the ones recognised by SocNetV. To be more precise, here is an example of the Pajek-like form that SocNetV understands. The numbers to the left are just indicating line numbers.
1) *Network 2) *Vertices 6 3) 1 "pe0" ic LightGreen 0.5 0.5 box 4) 2 "pe1" ic LightYellow 0.8473 0.4981 ellipse 5) 3 "pe2" ic LightYellow 0.6112 0.8387 triangle 6) 4 "pe3" ic LightYellow 0.201 0.7205 diamond 7) 5 "pe4" ic LightYellow 0.2216 0.2977 ellipse 8) 6 "pe5" ic LightYellow 0.612 0.1552 circle 9) *Arcs 10) 1 2 1 c black 11) 1 3 -1 c red 12) 2 4 1 c black 13) 3 5 1 c black 14) *Edges 15) 6 4 1 c black 16) 5 6 1 c yellow
Let me analyse this a little bit:
The first line (*Network) declares that this is a Pajek network.
The second line (*Vertices 6) declares the number of vertices of the network and identifies that the following lines describe node properties.
Each one of the following 6 lines (3-8) construct one node. Each node's line has 7 columns-properties:
Column 1 denotes the node's number.
Column 2 denotes the node's label.
Column 3 indicates that the next column carries the colour of the node's shape.
Column 4 denotes the colour of the node's shape.
Column 5 denotes the proportional X coordinate of the specific node on the canvas.
Column 6 denotes the proportional Y coordinate of the specific node on the canvas.
Column 7 denotes the node's shape.
Line 9 (*Arcs) identifies that the following lines will describe arcs from an node to another. Each one of the lines 10-13 construct one arc. For instance, Line 10 constructs an arc from node 1 to node 2 with weight 1 and black colour.
Line 14 identifies that the following lines will describe edges (double arcs) between nodes. Each one of the lines construct one edge. For instance, Line 10 constructs an arc from node 1 to node 2 with weight 1 and black color.
Note that it is legal to have mixed columns in Pajek-like network file. For instance you can have an node's specification line like this:
4 "label" 0.201 0.7205 ic LightYellow diamond.
Also, it is not necessary to declare X and Y coordinates or colors and shapes. In that case SocNetV will use the defaults, that is red diamonds scattered randomly across the canvas. Nevertheless, the first two columns must be valid node numbers and labels.
Note also that weights might be negative as in line 11. Negative weights are depicted as dashed lines on the canvas.
Colour names are not arbitrarily created. Valid colour names for nodes and arcs/edges are those specified in the X11 file: /usr/X11R6/lib/X11/rgb.txt, i.e. red, gray, violet, navy, green, etc. You can change colours of all network elements from inside SocNetV.
SocNetV also supports Pajek files which declare edges/arcs in matrices, like this:
*Vertices 11 1 "minister1" 0.2912 0.2004 ellipse 2 "pminister" 0.4875 0.0153 diamond 3 "minister2" 0.3537 0.3416 ellipse 3 "minister2" 0.3537 0.3416 ellipse 4 "minister3" 0.4225 0.5477 ellipse 5 "minister4" 0.4538 0.1603 ellipse 6 "minister5" 0.4900 0.3836 ellipse 7 "minister6" 0.6212 0.5038 ellipse 8 "minister7" 0.6450 0.2023 ellipse 9 "advisor1" 0.6488 0.6031 box 10 "advisor2" 0.3212 0.5515 box 11 "advisor3" 0.7188 0.4218 box *Matrix 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0
Here, the *Matrix tag replaces *Arcs or *Edges. An ordinary adjacency matrix follows describing all links.
Another possibility, is the *ArcsList tag. When SocNetV finds that tag in a Pajek file, it expects each node to declare list of its link to other nodes. Here is an example:
*Vertices 9 1 2 3 4 5 6 7 8 9 *Arcslist 2 1 3 9 1 3 4 5 3 1 4 7 4 1 2 3 5 1 3 4 7 2 8 9
For instance, the first line after *Arcslist means: "node 2 is connected to nodes 1, 3 and 9". It is very simple.
UCINET's DL format is one of the easiest to understand. For the moment, we support only FULL MATRIX mode. Each file starts with the "DL" mark; then the amount N of nodes is declared and the format (i.e. if a diagonal is present or not). Then, after the "LABELS:" mark we read the labels of each node line by line. That is, if N was 100 then we expect to read 100 labels. In the end, a DL file declares network data ("DATA") which is only the edges. For instance the network below, contains 4 nodes and 7 arcs/edges:
DL N=4 FORMAT = FULLMATRIX DIAGONAL PRESENT LABELS: On the normalization and visualization of author co-citation data:Salton's cosine versus the Jaccard index Caveats for the use of citation indicators in research and journalevaluations Should co-occurrence data be normalized? A rejoinder Home on the range - What and where is the middle in science andtechnology studies? DATA: 0 0 0.158114 0 0.201234 0 1 0 1 0 0 0 0.1 1 1 0