We wrote an article for the book “Learning Communities. Das Internet als neuer Lern- und Wissensraum” published by Christina Schachtner, Angelika Höber at Campus. More info about the book can be found at Campus and Amazon.
Our contribution about “Visualising information flows in self organising knowledge networks” and can be found [here]. (in German)

Last Thursday we have been guests of Gerhard Dirmoser, who showed us his impressing collection of diagrams and his diagrammatic library. Gerhard is one of the leading experts in the field of diagrammatic and is devoting his work to the development of a new epistemological approach to describe and order diagrams. This approach is outstanding, because it aims to work finally without textual description, only on diagrammatic relations. Therefore probably the word “description” is inappropriate at all, because in Gerhards studio you realize that his research process consists in ordering, relating and placing objects, very similar to Aby Warburgs Mnemosyne. (see also german wikepedia entry on Warburg)
Aby Warburg revolutionised art history by introducing replications for didactic purposes. Nowadays image processing and graph engines can produce new experiences of exploring art. Gerhard Dirmosers and Dietmar Offenhuber project SemaSpace is exactly about the question of exploring semantically structured data and memory spaces. Dietmar Offenhuber convincingly solved the problem of handling large amounts of nodes, even several thousands – and even if the nodes are represented by images. Here’s a short description of SemaSpace by the authors:
SemaSpace is a fast and easy to use graph editor for large knowledge networks, specially designed for the application in non technical sciences and the arts. It creates interactive graph layouts in 2d and 3d by means of a flexible algorithm. The system is powerful enough for the calculation of complex networks and can incorporate additional data such as images, sounds and full texts.
On the SemaSpace Website you will find not only the tool but also an interesting application:
“25 years of ars electronica
study conducted by Gerhard Dirmoser, contains all projects / people involved in ars electronica until 2003, based on collected material and data from the ars electronica database. original files of the study:”
But SemaSpace is more than an organised database. It represents a “space of memory” that commemorates the threads of theory and media art within the “ars electronica universum.” It can be seen in the tradition Giulio Camillos Memory Theatre (see also http://www.clausmoser.com/?p=378) (By the way Camillo is a must for interaction designers)
Dietmar is currently working on a new version of SemaSpace and Gerhard is now about to network his collection of 4000 diagrams within the graph editor. As already two thirds of the work has been done within 20 workdays it is quite obvious that it seems an appropriate way to organise large amount of image data in a reasonable time span.
There’s a lot of other work (texts, diagrams and network graphs) by Gerhard available here: http://www.servus.at/kontext/ARS/ (strongly recommended).
Special hint for us lucky Austrians: next Sunday, February 4, a whole day lecture takes place at Audi Max of Danube University Krems.
Here are some remarks on BuzzFeed after having tested it for some days.
First of all, it does what it promises: It feeds you with buzz.
The term buzz itself implies that there is a greater audience behind. It will not easily become a buzz when two mathematicians are discussing problems in algebra. Buzz needs a bigger number of people that talk about it and a potential to infect even more people. A buzz from the last year is no longer a buzz. BuzzFeed detects buzz before it becomes a bigger thing.

BuzzFeed is showing buzz a few moments before its tipping point. From an analytical perspective it is already clear that a new buzz is emerging, but the masses don’t know it yet. BuzzFeed is therefore an adequate means of keeping up with the public opinion and to be some eye glimpses ahead. It’s an accelerator of public discourse. But do we really need even more accelerators? Tools like BuzzFeed make it very clear that the blogosphere is a huge discourse machine and that its speed and effectiveness is growing. The whole machinery is based on the simple fact that communication is producing communication; sometimes a cascade of communication; But what is the outcome of it? Doesn’t that lead to a more and more superficial mode of communication? Does it make us more fit to face the challenges of a crazy and complex world or is it just another step to make it a bit more crazy and complex? To be honest I don’t know.
Under the bottom line – and beyond all sociological considerations (sorry I couldn’t withstand) - BuzzFeed is simply an vanguard media. It typically combines the following components:
1. Consumer Generated Media - mainly weblogs – that provide ever new content; in this case CGM is working like an armada of journalists chasing for latest news. Or to put it in other words: they work like sensor neurons in our nervous system that fire when they perceive a stimulus.
2. Analytical tools detect trends within the blogosphere. In the case of BuzzFeed these tools detect upcoming topics as patterns. Patterns mean that there’s not a chaotic sequence of “firing neurons” but there’s something going on; something that needs further interpretation.
3. Obviously these patterns are not self-explaining and require some training to interpret them. Therefore BuzzFeed hire editors to separate the wheat from the chaff and to write short introductory texts to topics they consider being upcoming and interesting enough to get featured.
BuzzFeed is therefore a hybrid media that combines a very large network of writers, computational power and human judgement. The latter seems not to be replaceable by technology and is still the key factor that makes a project juicy. We may expect many more interesting combinations of theses three components that make up new web media formats, not only including text but also podcasts and video.
Small-World Networks show that even a few random links in a highly regular network may reduce the average distance between vertices dramatically. (The original paper by Duncan J. Watts and Steven Strogatz is very short and easy to understand: http://tam.cornell.edu/SS_nature_smallworld.pdf)

Strogatz/Watts introduce two different indicators to measure the degree of linkage within large networks:
1. Average Path Length which measures the average distance between two vertices (e.g. persons) in a given net (e.g. world population).
Suppose you want to send a precious present from Person A to B from one end of the world to the other, by avoiding regular mail services. You only trust friends, and friends of friends. In this case path length means the chain of friendship connections between A and B. If A is a friend of B the Path length is 1. If it is 100 than there are 99 friends between A and B who help in sending the present.
Some pairs of vertices (in our example persons) may need fewer links that A and B, some others may need more.
The Average Path Length is the average number of connections needed to link pairs of vertices in a given network. Thus the Average Path Length is a good indicator of the network’s overall ability to bridge long distances in it.
It’s a macro indicator and it doesn’t say much about linkage in smaller parts of the network..
2.
Additionally Duncan J. Watts and Steven Strogatz (1998) introduced the clustering coefficient to measure the interconnectedness at the “neighborhood level” of a network.. The clustering coefficient reaches its maximum of “1” when all possible links within in a neighbourhood are links indeed. (or in other words, within a neigbourhood all vertices are connected with each other). For details see: http://en.wikipedia.org/wiki/Clustering_coefficient
It is noteworthy that the clustering coefficient does not measure whether there’s a cluster or not. It is based on the assumption that there is already order in the network.. In the example of Strogatz and Watts it is a perfectly ordered ring of vertices each connected to it’s direct neighbours and the next but one.
Strogatz and Watt has demonstrated (see graph) that by introducing only a few random links in such a perfectly ordered structure the average path length decreases dramatically whereas the clustering coefficient remains almost the same (i.e very high). A few random links are enough to turn the network into a small-world that combines both: the ability to bridge long distances in short pathes (few amount of links) and a dense web of edges within a neighbourhood (high clustering)
If we tried to apply the clustering coefficient to an application like the MemeMapper it would imply that we needed to define the neighbourhood of Weblogs in advance. The clustering coefficient would allow us to measure if a predefined set of Weblogs (a predefined cluster of Weblogs) is highly clustered or not (in other words if there is dense web of edgdes between vertices or if they are rather loosely linked.)
Special algorithms like Mark Newman’s “Fast algorithm for detecting community structure in networks” can be used to detect clusters. Newman’s algorithm was used in Vizster, developed by Jeffrey Heer and Danah Boyd,.
For details see:
The orignal paper by Watts/Strogatz: Collective dynamics of ‘small-world’ networks
http://en.wikipedia.org/wiki/Small-world_network
http://en.wikipedia.org/wiki/Clustering_coefficient
Implicit Structure and the Dynamics of Blogspace was written byEytan Adar, Lada Adamic, Li Zhang, and Rajan Lukose, from HP Information Dynamics Lab .
Its a quite early paper (2004), and it seems as if its authors had started at more or less the same time (Spring 2003) as we did the first Blogosphere Map prototype. Whereas we were focusing on the aesthetics of diffusion mapping, the IDL focused clearly on its analytics. The work done is quite impressive as it poses for the first time the relevant questions:
How can we analyse infection pathes, when there’s no explicit information about how news (represented by an URL) travelled through the blogosphere? (because there are only a few “via” links) How can we infere Infection routes? How can we measure similarity between blogs in order to infer Infection routes.
The authors not only posed the right questions but also gave competent answers by formulating measuring methods like blog_similarity and iRank. It opens up a wide field of further research to be done like e.g. more investigation about the different weight of link_similarity of Weblogs versus text_similarity versus infection timing in respect to inferring infection routes. Probably also other methods can be found.
In any case the paper proved that there are methods to map the general collaborative structure of the blogosphere, by identifying general (i..e. more likely) trails of infection and it is possible to infer infection routes by embedding explicit links in those general trails of infection.
related:
K-means clustering
Wikipedia: Custeranalyse/k-means
K-means-demo explains the method quite obviously.
Kruskal-Wallis Test
TFIDF Scheme (deutsch),
Support Vector Machine (SVM) , try out
better introduction than wikipeda
LIBSVM — A Library for Support Vector Machines (used for this paper) Introduction for SVM-beginners by the creators of LIBSVM
Graphviz was used to generate graphs.
Zoomgraph
During summer holidays in Spain I found time to read Barabási’s network bible “Linked”.
I focus on personal remarks, as there is a good book review available
further infos about the book here
As an adherent of self organisation theory I welcome most of the findings presented in the book. The “new” network theory seems to provide a general tool case in order to look at a variety of systems: technical networks as the internet as well as the nervous system or social relations.
This was yet a promise of cybernetics and later in the 80ies and 90ies by different self-organisation theories. I tried myself very hard to apply self-organising theories in the field of media theory (see thesis) but looking at it now in the light of network theory I have to admit that I got stuck on a descriptive level. I often needed to refer to analogies simply because the appropriate analysis tool were not available at that time. Although analogies are very important for learning and understanding new knowledge domains they are problematic at a scientific level especially when you try to explain a domain with the vocabulary of another domain. Therefore Humberto Maturana, who coined the term “autopoiesis” in the field of neurobiology, was not very happy about the German sociologist Niklas Luhmann, who wrote a phalanx of thick books about the “autopoiesis” of social systems. Maturana criticised that it would not be an adequate application of his theory.
The main reason for the emergence of new network theories lies in the fact that the information age produces a flood of data. E-mail archives, newsgroups and the web provide a huge database that stores human communication. Until the emergence of the internet, human communication has been very ephemeral. In order to study communication or social systems you needed either to refer to rather poor written sources like books or letters, or you had to design tests, questionnaires, or other kinds of artificial research environments. Now the data is out there and you simply need to harvest it and verify your research hypothesis.
Time will tell which kind of research questions can be answered by data based network analysis. My guess is that its unique role lies in its ability to tell us interesting things about systems not only at an intellectual level but also in a form that appeals our senses. Network analysis implies also a new form of scientific aesthetics that might pave the way for new forms of holistic understanding that we urgently need to cope with the challenges of the 21 century like global warming, poverty, “terrorism” and so on. I finally will result in new forms of maps that might extend our comprehension of complex processes and our intellectual capabilities to interact with them.
In our MemeMapper project we will try to make some – hopefully bigger - steps into that direction. Therefore we appreciate requests from network researchers in order to harvest