Visualising information flows in self organising knowledge networks

We wrote an article for the book “Learning Communities. Das Internet als neuer Lern- und Wissensraum” published by Christina Schachtner, Angelika Höber at Campus. More info about the book can be found at Campus and Amazon.

Our contribution about “Visualising information flows in self organising knowledge networks” and can be found [here]. (in German)

Filed under: data exploration, internal_research, maps, networkanalysis, social networks, theory
Posted: January 8, 2008 at 1:34 pm by Gernot
Tags: none

TouchGaph Photos Facebook and Interactive Friends Graph

TouchGraph Facebook Browser  10_small.jpg
By this summer two applications, using Facebooks API, have launched. They can be seen as the first field test of social network analysis within a broader audience of non-technical users. Apart from the actual design of the applcation, the reactions of the users, often being in first contact with social network analysis are interesting lessons to learn.

Design: First of all one has to state that both TouchGraph (TG) and the Interactive Friends Graph (IFG) seem to be largely inspired by Danah Boyd’s and Jeffrey Heer’s prototype Vizster. That’s not a critic, as vizster was a proper design addressing most of the questions a user might ask her/himself when browsing her/his social network. It’s wise to reuse design that has proven itself. Clustering and displaying common friends are logical usecases that have been already covered by Vizster.
The limited number of use cases – especially for non-scientific users – and the grammar of network graphs defined by nodes, edges and proven layout algorithm make it almost impossible to reinvent the wheel. So to a certain extent network graphs cannot avoid looking similar and using a common language.

Common problems: Both Applications suffer to a large extent from the fact that profile informations are largely not public within facebook. Even if your friends allow you to see their friends, connections between them easily disappear, as the path simply ends when a user’s profile is hidden. The Interactive Friends Graph tries to overcome this problem by some sort of viral marketing strategy: It has a special function to invite friends to publish their friendship relations in the form of the IFG. This may work out to a certain extent but nevertheless it remains a barrier for a user that want to explore a persons network. Touchgraph tries to solve the problem with another trick: It takes advantage of photo tagging what allows you to display networks of people tagged on the same photo.
“One can not see another person’s whole social network because Facebook only allows applications to get a list of one’s own friends. For other users it is only possible to get a list of people that they appear in photos with. Perhaps Facebook’s policy will change in the future.”

If you are lucky by having a large Facebook network of friends then Touchgrouph provides a wonderful tool to explore all the connections between your friends.

Metrics: The policy of ‘Facebook makes sense because it protects people from being stalked but it has negative impacts to the application of metrics. One must consider that the probability of being cut off raises exponentially with each degree of separation from a central point, simply because of the fact that the connecting point might not be within your network.
“Once one has launched the application, one can explore one’s extended social network by loading more photos for friends. Loading photos will add new users who are tagged in photos to the graph, and created edges between them based on friendships and common photo appearances. Note: It is only possible to load photos for friends and people within one’s network.”
Its simply ridiculous that facebook applies the term network to people of the same nationality or the same university and provides far more information for people within than for people outside of one’s network. There’s a structural barrier for becoming friends with people outside your own network and therefore any metrics will only affirm these restrictions.

Or lets put it in other words: The ways how blogs link in the blogosphere is far more inpredictable, because its easier to escape from national and other social ties whereas facebook structurally supports friendship connections within the own neighbourhood. One of the main benefits of network graphs seems to lie in providing a tool to go beyond your own neighbourhood and taking advantage of a weak tie between an important actor and yourself.
Within Facebook such explorations are rather hindered than supported.

Nevertheless the Touchgraph Facebook Photo Browser is certaily among the best WORKING social network analysis tool currently available for online usage. It has variety of fascinating and usefull features and is worth being tried out - if you have large facebook network.

Filed under: networkanalysis, social networks
Posted: September 23, 2007 at 12:34 pm by Gernot
Tags: none

BuzzFeed

Here are some remarks on BuzzFeed after having tested it for some days.
First of all, it does what it promises: It feeds you with buzz.
The term buzz itself implies that there is a greater audience behind. It will not easily become a buzz when two mathematicians are discussing problems in algebra. Buzz needs a bigger number of people that talk about it and a potential to infect even more people. A buzz from the last year is no longer a buzz. BuzzFeed detects buzz before it becomes a bigger thing.

buzzfeed2.jpg

BuzzFeed is showing buzz a few moments before its tipping point. From an analytical perspective it is already clear that a new buzz is emerging, but the masses don’t know it yet. BuzzFeed is therefore an adequate means of keeping up with the public opinion and to be some eye glimpses ahead. It’s an accelerator of public discourse. But do we really need even more accelerators? Tools like BuzzFeed make it very clear that the blogosphere is a huge discourse machine and that its speed and effectiveness is growing. The whole machinery is based on the simple fact that communication is producing communication; sometimes a cascade of communication; But what is the outcome of it? Doesn’t that lead to a more and more superficial mode of communication? Does it make us more fit to face the challenges of a crazy and complex world or is it just another step to make it a bit more crazy and complex? To be honest I don’t know.

Under the bottom line – and beyond all sociological considerations (sorry I couldn’t withstand) - BuzzFeed is simply an vanguard media. It typically combines the following components:
1. Consumer Generated Media - mainly weblogs – that provide ever new content; in this case CGM is working like an armada of journalists chasing for latest news. Or to put it in other words: they work like sensor neurons in our nervous system that fire when they perceive a stimulus.
2. Analytical tools detect trends within the blogosphere. In the case of BuzzFeed these tools detect upcoming topics as patterns. Patterns mean that there’s not a chaotic sequence of “firing neurons” but there’s something going on; something that needs further interpretation.
3. Obviously these patterns are not self-explaining and require some training to interpret them. Therefore BuzzFeed hire editors to separate the wheat from the chaff and to write short introductory texts to topics they consider being upcoming and interesting enough to get featured.

BuzzFeed is therefore a hybrid media that combines a very large network of writers, computational power and human judgement. The latter seems not to be replaceable by technology and is still the key factor that makes a project juicy. We may expect many more interesting combinations of theses three components that make up new web media formats, not only including text but also podcasts and video.

Filed under: Uncategorized, buzzanalytics, networkanalysis, theory
Posted: December 1, 2006 at 11:24 am by Gernot
Tags: none

Small-world networks

Small-World Networks show that even a few random links in a highly regular network may reduce the average distance between vertices dramatically. (The original paper by Duncan J. Watts and Steven Strogatz is very short and easy to understand: http://tam.cornell.edu/SS_nature_smallworld.pdf)

small_world.jpg

Strogatz/Watts introduce two different indicators to measure the degree of linkage within large networks:

1. Average Path Length which measures the average distance between two vertices (e.g. persons) in a given net (e.g. world population).
Suppose you want to send a precious present from Person A to B from one end of the world to the other, by avoiding regular mail services. You only trust friends, and friends of friends. In this case path length means the chain of friendship connections between A and B. If A is a friend of B the Path length is 1. If it is 100 than there are 99 friends between A and B who help in sending the present.
Some pairs of vertices (in our example persons) may need fewer links that A and B, some others may need more.
The Average Path Length is the average number of connections needed to link pairs of vertices in a given network. Thus the Average Path Length is a good indicator of the network’s overall ability to bridge long distances in it.
It’s a macro indicator and it doesn’t say much about linkage in smaller parts of the network..

2.
Additionally Duncan J. Watts and Steven Strogatz (1998) introduced the clustering coefficient to measure the interconnectedness at the “neighborhood level” of a network.. The clustering coefficient reaches its maximum of “1” when all possible links within in a neighbourhood are links indeed. (or in other words, within a neigbourhood all vertices are connected with each other). For details see: http://en.wikipedia.org/wiki/Clustering_coefficient

It is noteworthy that the clustering coefficient does not measure whether there’s a cluster or not. It is based on the assumption that there is already order in the network.. In the example of Strogatz and Watts it is a perfectly ordered ring of vertices each connected to it’s direct neighbours and the next but one.
Strogatz and Watt has demonstrated (see graph) that by introducing only a few random links in such a perfectly ordered structure the average path length decreases dramatically whereas the clustering coefficient remains almost the same (i.e very high). A few random links are enough to turn the network into a small-world that combines both: the ability to bridge long distances in short pathes (few amount of links) and a dense web of edges within a neighbourhood (high clustering)

If we tried to apply the clustering coefficient to an application like the MemeMapper it would imply that we needed to define the neighbourhood of Weblogs in advance. The clustering coefficient would allow us to measure if a predefined set of Weblogs (a predefined cluster of Weblogs) is highly clustered or not (in other words if there is dense web of edgdes between vertices or if they are rather loosely linked.)

Special algorithms like Mark Newman’s “Fast algorithm for detecting community structure in networks” can be used to detect clusters. Newman’s algorithm was used in Vizster, developed by Jeffrey Heer and Danah Boyd,.

For details see:
The orignal paper by Watts/Strogatz: Collective dynamics of ‘small-world’ networks
http://en.wikipedia.org/wiki/Small-world_network
http://en.wikipedia.org/wiki/Clustering_coefficient

Filed under: Uncategorized, networkanalysis, theory
Posted: September 19, 2006 at 9:56 am by Gernot
Tags: none

The Bacon Story

netmap.jpg
Barabasi - in his book “Linked” - mentioned a story about an actor called Bacon who was not well known during his career, but who became quite popular among network scientists. He’s a good example that in a small world (like hollywood) also less popular actors seem to be hubs; but bacon’s connectivity is more a attribute of the network and less an attribute of himself. There’s a nice visualisation of that story available at: http://www.netmapanalytics.com/demo.html.

Filed under: Uncategorized, maps, networkanalysis
Posted: September 15, 2006 at 10:48 am by Gernot
Tags: none

Implicit Structure and the Dynamics of Blogspace

Implicit Structure and the Dynamics of Blogspace was written byEytan Adar, Lada Adamic, Li Zhang, and Rajan Lukose, from HP Information Dynamics Lab .
Its a quite early paper (2004), and it seems as if its authors had started at more or less the same time (Spring 2003) as we did the first Blogosphere Map prototype. Whereas we were focusing on the aesthetics of diffusion mapping, the IDL focused clearly on its analytics. The work done is quite impressive as it poses for the first time the relevant questions:
How can we analyse infection pathes, when there’s no explicit information about how news (represented by an URL) travelled through the blogosphere? (because there are only a few “via” links) How can we infere Infection routes? How can we measure similarity between blogs in order to infer Infection routes.
The authors not only posed the right questions but also gave competent answers by formulating measuring methods like blog_similarity and iRank. It opens up a wide field of further research to be done like e.g. more investigation about the different weight of link_similarity of Weblogs versus text_similarity versus infection timing in respect to inferring infection routes. Probably also other methods can be found.
In any case the paper proved that there are methods to map the general collaborative structure of the blogosphere, by identifying general (i..e. more likely) trails of infection and it is possible to infer infection routes by embedding explicit links in those general trails of infection.
related:
K-means clustering
Wikipedia: Custeranalyse/k-means
K-means-demo explains the method quite obviously.
Kruskal-Wallis Test
TFIDF Scheme (deutsch),
Support Vector Machine (SVM) , try out
better introduction than wikipeda
LIBSVM — A Library for Support Vector Machines (used for this paper) Introduction for SVM-beginners by the creators of LIBSVM
Graphviz was used to generate graphs.
Zoomgraph

Filed under: Uncategorized, maps, networkanalysis, theory
Posted: September 8, 2006 at 11:24 am by Gernot
Tags: none

Barabási’s “Linked” and the MemeMapper

During summer holidays in Spain I found time to read Barabási’s network bible “Linked”.
I focus on personal remarks, as there is a good book review available
further infos about the book here

As an adherent of self organisation theory I welcome most of the findings presented in the book. The “new” network theory seems to provide a general tool case in order to look at a variety of systems: technical networks as the internet as well as the nervous system or social relations.

This was yet a promise of cybernetics and later in the 80ies and 90ies by different self-organisation theories. I tried myself very hard to apply self-organising theories in the field of media theory (see thesis) but looking at it now in the light of network theory I have to admit that I got stuck on a descriptive level. I often needed to refer to analogies simply because the appropriate analysis tool were not available at that time. Although analogies are very important for learning and understanding new knowledge domains they are problematic at a scientific level especially when you try to explain a domain with the vocabulary of another domain. Therefore Humberto Maturana, who coined the term “autopoiesis” in the field of neurobiology, was not very happy about the German sociologist Niklas Luhmann, who wrote a phalanx of thick books about the “autopoiesis” of social systems. Maturana criticised that it would not be an adequate application of his theory.

The main reason for the emergence of new network theories lies in the fact that the information age produces a flood of data. E-mail archives, newsgroups and the web provide a huge database that stores human communication. Until the emergence of the internet, human communication has been very ephemeral. In order to study communication or social systems you needed either to refer to rather poor written sources like books or letters, or you had to design tests, questionnaires, or other kinds of artificial research environments. Now the data is out there and you simply need to harvest it and verify your research hypothesis.

Time will tell which kind of research questions can be answered by data based network analysis. My guess is that its unique role lies in its ability to tell us interesting things about systems not only at an intellectual level but also in a form that appeals our senses. Network analysis implies also a new form of scientific aesthetics that might pave the way for new forms of holistic understanding that we urgently need to cope with the challenges of the 21 century like global warming, poverty, “terrorism” and so on. I finally will result in new forms of maps that might extend our comprehension of complex processes and our intellectual capabilities to interact with them.

In our MemeMapper project we will try to make some – hopefully bigger - steps into that direction. Therefore we appreciate requests from network researchers in order to harvest

Filed under: Uncategorized, maps, networkanalysis, theory
Posted: August 30, 2006 at 12:52 pm by Gernot
Tags: none