What is a tech innovation hub anyway?

Soon-to-launch THINK in Kigali: Incubator? Hub? Both?

Soon-to-launch THINK in Kigali: Incubator? Hub? Both?

Innovation and entrepreneurship “hubs” and “labs” are all the rage these days. A wide range of actors is convinced that hubs represent a genuinely new and exciting model for supporting (tech) entrepreneurs, in particular in Sub-Saharan Africa, which is the focus of my research.

Here is a snapshot of publications, just from the last two years, that have tried to define, assess, and take stock of the phenomenon:

While this list illustrates that innovation hubs are popular, little analysis has been done on why that is. One obstacle is that many, if not most, of the discussions around hubs use the term quite loosely. For one, “hub” has several connotations that our zeitgeist sees as desirable (such as open and egalitarian interaction, collaboration, or grassroots), and it appears to me that the word is often used as a brand more than a meaningful descriptor.

So, what is a (tech innovation) hub anyway? Is it just a trend term replacing “incubators”, “R&D labs”, “science parks”, “technopoles”, or “training facilities” that have recently fallen from grace? Is there anything special and new about organizations like iHub or the Impact Hub?

A few weeks ago, I joined a group of researchers[1] gathered by Tuukka Toivonen to discuss these questions. All of us had done empirical research on hubs, so we had an intuitive understanding of the concept. The group had also examined hubs with different goals, located in geographical and cultural contexts spanning Africa, Europe, and East Asia, which gave us a good range for comparisons. We were clear that hubs shouldn’t be reduced just to the hub space, and that they are instead a particular type of organization.

Word cloud of my workshop notes

Word cloud of my workshop notes

Yet, we soon got lost in a jungle of buzz words and vague paradigms, and we found it surprisingly difficult to pin down the uniqueness of hubs as a new organizational form (if it is one) with conceptual precision. We also noticed that the ideals that hubs aspire to are often quite different from the more mundane realities of life and work inside of hubs.

So we decided to derive a hub definition based on the stereotypical ideal of a hub, which could then serve to distinguish hubs from other organizations based on their vision and mission. Only at a later stage, we wanted to compare ambition, reality, and actual “impact”.

Here are the attributes of an idealized hub that we came up with:[2]

  1. Communal

Hubs heavily emphasize that they are merely a meeting and convening point for a community, and that without this community, they would be nothing. A hub community is not just any group of people. Members of the community share a certain identity and have a sense of belonging and/or participation. This often translates into a higher mental activation (inspiration, motivation) around whatever is the common cause of the hub.

  1. Self-organizing and adaptive

The community idea is also at the heart of another defining feature of hubs: their self-organizing and adaptive nature. Hubs cannot be set up in a top-down manner; they always emerge from the “grassroots” initiative of innovators and entrepreneurs. While hubs are more stable and continuous than event series like Barcamps, Startup Weekends, conferences, or innovation competitions, they also constantly adapt to changing community needs. Accordingly, hub managers usually see themselves less as leaders and more as facilitators. While donors and sponsors are usually needed to fund hubs, they are only seen as supporters that are not allowed to impose an agenda that would not be in line with the needs that the community articulates. This is in direct accordance with principles of the Startup Community movement. However, constant adaptation often does not jive well with institutional frameworks of funders that are based on pre-specified accountability, long-term planning, and targets, in particular in the context of monitoring and evaluation mandated by development organizations.[3]

  1. Instead of innovating, enabling innovators

Implicit in the previous points is another hub attribute that is noteworthy because it is often forgotten in discussions on the “impact” and effects of hubs: hubs are not themselves creators or implementers of innovations (or projects, startups, apps, etc.). Instead, hubs see themselves as enablers of innovators and entrepreneurs, or, even more broadly, doers of some sort. Hubs can be more or less selective and stringent in terms of which doers they support, but in the end entrepreneurs are seen to be the ones with the real-world impact, while the hub just enables them. As one can imagine, this makes attributing and quantifying the impact that the hub itself has very challenging, especially if the hub offers a range of membership tiers and a variety of more or less hands-on programs. This also means that expectations towards a hub’s impact can hardly be codified as pre-specified targets (such as “number of startups launched”), and instead evaluations need to trace indirect and unexpected causal pathways of impact that result from the enabling-the-doers setup.

  1. Heterogeneous knowledge, serendipitously combined

Maybe the most interesting feature of hubs is that they aim to convene like-minded individuals while at the same time bringing together people with different backgrounds and knowledge. For instance, to stimulate software and mobile app innovations, hubs usually aim to gather techies and coders, but also bring in business people and investors. At the core is the idea that startups need complementary inputs (e.g., creative product design and financing), but also that innovation inherently relies on new and unlikely combinations of existing knowledge. Hubs build on the notion that ideation and creativity can neither be pre-specified nor coerced, and so they aim to create a structure in which individuals serendipitously interact with others that they would not typically meet. This is similar to the argument that “thinking in silos” inhibits creativity, and so hubs invite people to step out of their regular work routines and openly interact with new contacts until a “happy accident” happens. Such a setup also relies on a non-hierarchical and open relationship structure between community members: everyone is encouraged to engage with everyone else. What exactly are the right combinations of like-mindedness and heterogeneity is a difficult question, and to us it seemed as one of the most interesting lines of inquiry for research on hubs.

  1. Local outposts of a higher cause

Another intriguing facet of hubs is that they emphasize adaptation to local context, but at the same time tend to frame themselves as part of a global movement. The “global entrepreneurship movement / revolution” often serves as thSOAS workshop notese overarching value and belief system, and the Lean Startup and Business Model Canvas are examples of more concrete shared understandings of tech entrepreneurs. In fact, co-working itself is increasingly seen as a global movement, and it has started to yield insights and templates for the design of a hub space (which might explain why hub spaces look so alike across vastly different geographies and cultures). Importantly, hubs become local representations of globally homogenous understandings (“movement”), but in the local context, where these understandings are unique and new, the hub can actually host a subculture (“revolution”) compared to incumbent and prevalent organizational designs and ways of doing things.


Now, I want to be clear that an organization does not necessarily have to meet all of these attributes to qualify as a hub. Instead, this list is meant to describe the idealized and stereotypical hub concept. Hubs usually only emphasize a few of these attributes, or they might aspire to live up to each one but in reality cannot meet all of them. Also financial sustainability plays a major role: hubs often have to make compromises and budge to funders’ and sponsors’ agendas to keep their ship afloat.

Yet, I do think that the above description helps us to distinguish hubs, for instance, from organizational models that only provide immediate business and resource support without a communal element, like traditional incubators. We also now have a better basis to discuss what is new about hubs, and how we could tackle questions about their effectiveness and role in innovation (eco)systems. My hope is that we can build on this start of a conceptual understanding to improve the quality of our discussions, and stop comparing apples to oranges.

Still, an actual hub will almost never fall neatly into the outlined concept. In fact, incubators, accelerators, and even science parks have started to borrow elements from the hub concept, resulting in mashups of organizational models. Another trend is the co-location of pure hub models with more structured support programs, like inside and near Bishop Magua Centre in Nairobi.[4]

In upcoming blog posts, I will outline ideas for theoretical perspectives and empirical research, as well as basic hub categories and preliminary findings on tradeoffs and funding structures from my previous and ongoing research. Please get in touch with any feedback and comments.


Tuukka Toivonen provided comments and feedback to this post. My research is funded by the Clarendon Fund and the Skoll Centre for Social Entrepreneurship at the University of Oxford.


[1] The workshop participants can be found on twitter @Tuukka_T @KindaAS @williamhan @Andrejcisneros @TimsWeiss @queaky @IrinaVPopova, and LinkedIn http://lu.linkedin.com/in/lgryszkiewicz

[2] In an interview that came out shortly after our workshop, Erik Hersman, co-founder of the iHub in Nairobi, reflected on the iHub and its unique features. Interestingly, his points are very similar to the attributes that we derived based on a much wider sample of hubs.

[3] This actually mirrors a broader problem and debate about incentive-setting and results measurement in international development projects, for instance, discussed by Robert Chambers here.

[4] In fact, people have started to call the building itself an “ecosystem”, for example, the iHub community manager.


Off to Explore the Inner Workings of African Tech Innovation Networks

Tech innovation hubs like kLab in Rwanda have been established across Africa

Tech innovation hubs like kLab in Rwanda have been established across Africa

Even if you are just a casual follower of technology in developing countries, you will probably by now have come across blog posts and news articles touting Africa’s tech entrepreneurship boom.[1] Indeed, the first fast-growing mobile app startups have come up, the first Pan-African startup innovation platforms and conventions have been assembled, and thousands of aspiring technologists and would-be entrepreneurs across the continent are now looking to solve problems and build companies with technology. Along with the buzz, a consensus has emerged that local digital production—such as African startups targeting mobile applications at businesses or consumers in their home country or region—could and should be an important contribution to economic and social development.

At the same time, right underneath the feel-good patina of mainstream and donor media that is happy to report African success stories, there are also critical voices and emerging debates. Many of the arguments revolve around the risks and benefits of supporting local tech entrepreneurs, and how to use the scarce available resources.[2] In this context, the rise of a new type of organization, the tech innovation hub, has caught people’s attention.[3] However, it has proven extremely tricky to identify the desired and actual impact of these systemic innovation intermediaries (Smits & Kuhlmann, 2004), and so the few available assessments range all the way from questioning to excitement to disillusionment.

In short, there is clearly a lot of confusion around how to support tech entrepreneurship and early stage innovation across Africa, and no definitive models have emerged. Stakeholders of local innovation systems are still grappling with a long list of questions concerning the “if”, “who”, “how”, “when”, and “where” of tech entrepreneurship support. In particular, it is unclear whether hubs are effective as innovation brokers (Klerkx & Leeuwis, 2009).

In my dissertation research that goes beyond my work at infoDev, I want to address some of these questions. Based on my initial reading of the available literature and evidence, I believe that “innovation networks” (that is, the relations and interactions between entrepreneurs and other actor groups in innovation systems) are a key part to the puzzle, and so I will apply qualitative and quantitative social network analysis as an analytical method.[4]

From September to December 2014, I will kick off the data collection and spend one month each in Kigali, Harare, and Accra. With this study setup, I hope to capture tech innovation happening (or not happening) in contexts that differ in terms of factors such as geography, economic development, entrepreneurship mentality and “culture”, tech innovation legacy, “vibrancy” and the number of already present actors in the innovation system, and many other influences.

I’m hoping that the result will be a better understanding of the dynamics underlying African tech innovation systems. Ultimately, my research is meant to inform and shape the policy and decision-making of people engaging in questions around tech entrepreneurship and local digital production in cities all over the continent.

I invite you to comment and contact me if you would like to be involved in this research, or simply be kept posted about my findings. Do reach out especially if you are a stakeholder of the tech innovation systems of Kigali, Harare, and Accra. Academics interested in this line of research should also take a look at the AAG call that Mark Graham, Isis Hjorth, and I recently put out.



Hekkert, M. P., Suurs, R. A. A., Negro, S. O., Kuhlmann, S., & Smits, R. E. H. M. (2007). Functions of innovation systems: A new approach for analysing technological change. Technological Forecasting and Social Change, 74(4), 413–432.

Klerkx, L., & Leeuwis, C. (2009). Establishment and embedding of innovation brokers at different innovation system levels: Insights from the Dutch agricultural sector. Technological Forecasting and Social Change, 76(6), 849–860.

Smits, R., & Kuhlmann, S. (2004). The rise of systemic instruments in innovation policy. International Journal of Foresight and Innovation Policy, 1(1-2), 4–32.


[1] Here is a sample of articles in major media outlets: New York Times, Huffington Post, The Guardian, NPR, The Economist, Tech Crunch, and Wired UK.

[2] A nerve was struck by a Wired UK article that resulted in a chain of at times combative blog posts by Tom Jackson, Sam Gichuru, Mbwana Alliy, Josiah Mugambi, Erik Hersman, and Jon Stever (as collected by Erik and Jon in their blog posts). A more recent debate was started by Dan Evans and followed by answers from TMS Ruge and Jon Gosier. Also the comments section of Tim Kelly’s widely noted blog post brings up some interesting issues.

[3] At the risk of missing others, a few of the better known addresses on tech innovation hubs are: Afrilabs, VC4Africa’s tag on Hubs, Bongohive’s Hubs Map, the iHub blog, and Afrihive.

[4] That said, I will complement the network analyses with broader, qualitative innovation system assessments based on Hekkert et al.’s (2007) “functions” perspective.


Diary of an internet geography project #4

Screen Shot 2014-08-05 at 1.31.00 PMContinuing with our series of blog posts exposing the workings behind a multidisciplinary big data project, we talk this week about the process of moving between small data and big data analyses. Last week, we did a group deep dive into our data. Extending the metaphor: Shilad caught the fish and dumped them on the boat for us to sort through. We wanted to know whether our method of collecting and determining the origins of the fish was working by looking at a bunch of randomly selected fish up close. Working out how we would do the sorting was the biggest challenge. Some of us liked really strict rules about how we were identifying the fish. ‘Small’ wasn’t a good enough description; better would be that small = 10-15cm diameter after a maximum of 30 minutes out of the water. Through this process we learned a few lessons about how to do this close-looking as a team. 

Step 1: Randomly selecting items from the corpus

We wanted to know two things about the data that we were selecting through this ‘small data’ analysis: Q1) Were we getting every citation in the article or were we missing/duplicating any? Q2) What was the best way to determine the location of the source?

Shilad used the WikiBrain software library he developed with Brent to identify all roughly one million geo-tagged Wikipedia articles. He then collected all external URLs (about 2.9 million unique URLs) appearing within those articles and used this data to create two samples for coding tasks. He sampled about 50 geotagged articles (to answer Q1) and selected a few hundred random URLs cited within particular articles (to answer Q2).

  • Batch 1 for Q1: 50 documents each containing an article title, url, list of citations, empty list of ‘missing citations’
  • Batch 2 for Q2: Spreadsheet of 500 random citations occurring in 500 random geotagged articles.

Example from batch 1:

Coding for Montesquiu

  1. Visit the page at Montesquiu
  2. Enter your initials in the ‘coder’ section
  3. Look at the list of extracted links below in the ‘Correct sources’ section
  4. Add a short description of each missed source to the ‘Missed sources’ section

Initials of person who coded this:

Correct sources

Missing sources

Example from batch 2:

url domain effective domain article article url
books.google.ca google.ca Teatro Calderón (Valladolid) http://en.wikipedia.org/

For batch 1, we looked up each article and made sure that the algorithm we were using was catching all the citations. We found that there were a few anomalies where there was a duplication of citations (for example, when a single citation contained two urls: one to the ISBN address and another to a Google books url) or when we were missing citations (when the API was only listing a URL once when it had been used multiple times or when a book was cited without a url, for example) or when we were getting incorrect citations (when the citation url pointed to the Italian National Institute of Statistics (Istat) article on Wikipedia rather than the Istat domain).

The town of El Bayad in Libya contained two citations that weren’t included in the analysis because they didn’t contain a url, for example. One appears to be a newspaper and the other a book, but I couldn’t find the citations online. These would not be included in the analysis but it was the only example like this:

  • Amraja M. el Khajkhaj, “Noumou al Mudon as Sagheera fi Libia”, Dar as Saqia, Benghazi-2008, p.120.
  • Al Ain newspaper, Sep. 26; 2011, no. 20, Dar al Faris al Arabi, p.7.

We listed each of these anomalies in order to work out a) whether we can accommodate them in the algorithm or whether b) there are so few of them that they probably won’t affect the analysis too heavily.

Step 2: Developing a codebook and initial coding

I took the list of 500 random citations in batch 2 and went through each one to develop a new list of 100 working URLs and a codebook to help the others code the same list. I discarded 24 dead links and developed a working definition for each code in the codebook.

The biggest challenge when trying to locate citations in Wikipedia is whether to define the location according to the domain that is being pointed to, or whether one should find the original source. Google books urls are the most common form of this challenge. If a book is cited and the url points to its Google books location, do we cite the source as coming from Google or from the original publisher of the work?

My initial thought was to define URL location instead of original location — mostly because it seemed like the easiest way to scale up the analysis after this initial hand coding. But after discussing it, I really appreciated when Brent said, ‘Let’s just start this phase by avoiding thinking like computer scientists and code how we need to code without thinking about the algorithm.’ Instead, we tried to use this process as a way to develop a number of different ways of accurately locating sources and to see whether there were any major differences afterwards. Instead of using just one field for location, we developed three coding categories.

Source country:

Country where the article’s subject is located | Country of the original publisher | Country of the URL publisher

We’ll compare these three to the:

Country of the administrative contact for the URL’s domain

that Shilad and Dave are working on extracting automatically.

When I first started doing the coding, I was really interested in looking at other aspects of the data such as what kinds of articles are being captured by the geotagged list, as well as what types of sources are being employed. So I created two new codes: ‘source type’ and ‘article subject’. I defined the article subject as: ‘The subject/category of the article referred to in the title or opening sentence e.g. ‘Humpety is a village in West Sussex, England’ (subject: village)’. I defined source type as ‘the type of site/document etc that *best* describes the source e.g. if the url points to a list of statistics but it’s contained within a newspaper site, it should be classified as ‘statistics’ rather than ’newspaper’.

Coding categories based on example item above from batch 2:

subject subject country original publisher location URL publisher location language source type
building Spain Spain US Spanish book

In our previous project we divided up the ‘source type’ into many facets. These included the medium (e.g. website, book etc) and the format (statistics, news etc). But this can get very complicated very fast because there are a host of websites that do not fall easily into these categories. A url pointing to a news report by a blogger on a newspaper’s website, for example, or a link to a list of hyperlinks that download as spreadsheets on a government website. This is why I chose to use the ‘best guess’ for the type of source because choosing one category ends up being much easier than the faceted coding that we did in the previous project.

The problem was that this wasn’t a very conclusive definition and would not result in consistent coding. It is particularly problematic because we are doing this project iteratively and we want to try to get as much data as possible so that we have it if we need it later on. After much to-ing and fro-ing, we decided to go back to our research questions and focus on those. The most important thing that we needed to work out was how we were locating sources, and whether the data changed significantly depending on what definition we used. So we decided not to focus on the article type and source type for now, choosing instead to look at the three ways of coding location of sources so that we could compare them to the automated list that we develop.

This has been the hardest part of the project so far, I think. We went backwards and forwards a lot about how we might want to code this second set of randomly sampled citations. What definition of ‘source’ and ‘source location’ should we use? How do we balance the need to find the most accurate way to catch all outliers and a way that we could abstract into an algorithm that would enable us to scale up the study to look at all citations? It was a really useful exercise, though, and we have a few learnings from it.

- When you first look at the data, make sure you all do a small data analysis using a random sample;

- When you do the small data analysis, make sure you suspend your computer scientist view of the world and try to think about what is the most accurate way of coding this data from multiple facets and perspectives;

- After you’ve done this multiple analysis, you can then work out how you might develop abstract rules to accommodate the nuances in the data and/or to do a further round of coding to get a ‘ground truth’ dataset.

In this series of blog posts, a team of computer and social scientists including Heather Ford, Mark Graham, Brent Hecht, Dave Musicant and Shilad Sen are documenting the process by which a group of computer and social scientists are working together on a project to understand the geography of Wikipedia citations. Our aim is not only to better understand how far Wikipedia has come to representing ‘the sum of all human knowledge’ but to do so in a way that lays bare the processes by which ‘big data’ is selected and visualized. 


Wikipedia and breaking news: The promise of a global media platform and the threat of the filter bubble

Heather Ford gave this talk at Wikimania in London on Sunday warning Wikipedians about the fact that they are by no means a completely ‘neutral’ resource and that they suffer from a ‘homegrown’ bias that results from different points of view about what is notable at a local level. This is especially true when a newsworthy event happens far from the center of Wikipedia activity (and Western media scrutiny) in North America and Western Europe. It has been cross posted it from hblog.org

In the first years of Wikipedia’s existence, many of us said that, as an example of citizen journalism and journalism by the people, Wikipedia would be able to avoid the gatekeeping problems faced by traditional media. The theory was that because we didn’t have the burden of shareholders and the practices that favoured elite viewpoints, we could produce a media that was about ‘all of us’ and not just ‘some of us’.

Dan Gillmor (2004) wrote that Wikipedia was an example of a wave of citizen journalism projects initiated at the turn of the century in which ‘news was being produced by regular people who had something to say and show, and not solely by the “official” news organizations that had traditionally decided how the first draft of history would look’ (Gillmor, 2004: x).

Yochai Benkler (2006) wrote that projects like Wikipedia enables ‘many more individuals to communicate their observations and their viewpoints to many others, and to do so in a way that cannot be controlled by media owners and is not as easily corruptible by money as were the mass media.’ (Benkler, 2006: 11)

I think that at that time we were all really buoyed by the idea that Wikipedia and peer production could produce information products that were much more representative of “everyone’s” experience. But the idea that Wikipedia could avoid bias completely, I now believe, is fundamentally wrong. Wikipedia presents a particular view of the world while rejecting others. Its bias arises both from its dependence on sources which are themselves biased, but Wikipedia itself has policies and practices that favour particular viewpoints. Although Wikipedia is as close to a truly global media product than we have probably ever come*, like every media product it is a representation of the world and is the result of a series of editorial, technical and social decisions made to prioritise certain narratives over others. Continue Reading


Labour and the Internet: Takeaways From the ISA World Congress of Sociology #3

In this series of blog posts, Vili Lehdonvirta writes about his thoughts and experiences at ISA 2014 in Yokohama, Japan. In this third and final part, he discusses organized labour and global production networks, and draws implications to the study of online work. Read the first and second parts.

poster-isa-congress2014In my last post, I concluded that we must ask not only how individuals and institutions win or lose from labour market flexibilization, but also how they adapt to it. Unions are among the institutions perhaps most troubled by flexibilization. As work becomes more individualized, there are fewer common interests and shared identities on the basis of which workers could be organized. Workers may also be physically and temporally more detached from each other, especially in online work. But the need for collective action has not gone anywhere, on the contrary: the dismantling of labour standards means that there are no institutional limits to how poor wages and onerous terms individuals with little or no bargaining power can be subjected to. Indeed several sessions at ISA gave examples of just such wages and terms.

In a session on migrant worker diasporas, presenters such as Janice Fine discussed the challenges that traditional unions face in adapting to the changing circumstances. One challenge is that unions are often against flexibilization as a whole, instead holding up standard regulated employment as the gold standard. Some workers, like the independent professionals studied by Guido Cavalca, express preference for flexibility, and for this reason don’t see unions as representing their interests. Other workers are nonstandards by necessity, and find themselves excluded from some traditional unions, as described by Fine. However, in this session and also another session on informal workers, presenters reported success stories of how nonstandard workers around the world have begun to organize, and engage in collective actions that differ somewhat from the ways of traditional unions. Some adaptation seems to be happening.

The rise of global production networks

Labour market flexibilization in industrialized societies took place over the same period of time as the liberalization of international trade, and thus also involves a significant global dimension: the restructuring of corporate value chains to reach across the globe, and the resulting new international divisions of labour. ISA sessions paid particular attention to this global dimension, with sessions devoted specifically to topics such as transnational corporate networks, migrant workers, Chinese workers in the global economy, and perspectives on development from low-income countries. Given this years theme, many presenters discussed how the international reshuffling of work alleviates some local and global inequalities, while sometimes creating significant new inequalities. For example, Ngai Pun’s paper on Foxconn’s almost totalizing control over millions of Chinese workers’ lives both during and outside working hours was familiar, but still startling.


“GPNs undercut local national union bargaining power, because decision-making happens elsewhere not accessible for locally negotiated response”

In a session on global labour movements, Michael Fichter used the concept of global production networks (GPN) to theorize transnational corporations’ value chains from a labour perspective. According to Fichter, GPNs led by transnational corporations account for 80 percent of global trade today. By focusing on global networks rather than national labour markets or individual transnational corporations as the unit of analysis, we obtain a fuller picture of labour market dynamics. GPNs drive the precarization of work across the chain, from outsourcing and offshoring to whipsawing. Moreover, GPNs undercut local national union bargaining power, because corporate decision-making happens elsewhere and is not accessible for locally negotiated response. GPNs are also constructed strategically for profit, not in accordance with traditional sectoral boundaries like unions are, further complicating collective action.

In the same session, Jeremy Anderson presented an example of how one international trade union federation, ITF, is learning to grapple with GPNs. It tries to exercise “logistical power”, or identify strategic intervention points in the network, and to create “resonant places”, or internationally supported local struggles with wider resonance. Other papers in the session were of equally high quality and formed interesting complements and contrasts, making this in my opinion the most successful session of all that I attended. There was also another session with overlapping content that I missed.

Digital revolutions?

An elephant in the room that in my opinion remains to be fully acknowledged by most sociologists of work and labour movements is the rapidly advancing digitalization across societies and workplaces. What bearing will it have on the issues discussed above? In policy discourses, digital connectivity is often associated with a number of potential new positive developments: open access to global markets, increased opportunities for individuals, disintermediation and decreased dependency on institutions, new socially inclusive models of production, irrelevance of geography and location, ability to voice grievances, and ability to self-organize and take collective action as a crowd. If these are indeed the consequences, digitalization will be associated with massive upheaval in the world’s labour markets and production networks.


“connectivity does change the way labour and production functions, but we must temper our expectations”

While I missed most sessions dealing with digital media as such, a highly enlightening paper was presented at a session on the internationalization of knowledge workers by Michael Vicente. His paper dealt with open source software development, or more accurately, free software development. The open source model of software development is sometimes painted as an example of a new digitally enabled production model that transcends geographic boundaries and is socially inclusive, in that anyone can contribute. But according to Vicente’s analyses, open source development communities are in practice highly geographically territorial and homophilic. Leading projects studied by Vicente are embedded in a handful of elite academic institutions from which most of their contributors are drawn, and the activity is highly professionalized in the sense of excluding those not familiar with specific tools, processes, and language.

In other words, although an open source software development community is clearly different from a software company in many respects, Vicente’s work shows that upon closer inspection, the model doesn’t necessarily live up to the most exaggerated expectations. The more general point here is this: digital connectivity does matter and can change the way labour markets and global production networks function, but we must temper our expectations, ground them in the bigger societal picture, and subject them to rigorous empirical investigation before drawing too many conclusions. In studying online work and online labour markets, we might thus ask the following questions:

  1. How do online labour markets relate to global value networks controlled by transnational corporations? To what extent do they simply extend these existing networks, and to what extent do they offer genuinely novel means of international economic organization?
  1. How do workers assert collective power on online labour markets, or in the larger “virtual production networks” that these markets are part of?
  1. How do workers use the Internet for organizing transnationally across physical and virtual production networks?

This is the end of my series of blog posts on ISA 2014. Thanks for reading, and feel free to comment. If you want to read more stuff from me, get my book Virtual Economies: Design and Analysis (MIT Press 2014, with Edward Castronova), read my latest article in The Information Society, or follow me on Twitter.


Full disclosure: Diary of an Internet geography project #3

Screen Shot 2014-07-25 at 2.51.29 PMIn this series of blog posts, we are documenting the process by which a group of computer and social scientists are working together on a project to understand the geography of Wikipedia citations. Our aim is not only to better understand how far Wikipedia has come to representing ‘the sum of all human knowledge’ but to do so in a way that lays bare the processes by which ‘big data’ is selected and visualized. In this post, I outline the way we initially thought about locating citations and Dave Musicant tells the story of how he has started to build a foundation for coding citation location at scale. It includes feats of superhuman effort including the posting of letters to a host of companies around the world (and you thought that data scientists sat in front of their computers all day!)   

Many articles about places on Wikipedia include a list of citations and references linked to particular statements in the text of the article. Some of the smaller language Wikipedias have fewer citations than the English, Dutch or German Wikipedias, and some have very, very few but the source of information about places can still act as an important signal of ‘how much information about a place comes from that place‘.

When Dave, Shilad and I did our overview paper (‘Getting to the Source‘) looking at citations on English Wikipedia, we manually looked up the whois data for a set of 500 randomly collected citations for articles across the encyclopedia (not just about places). We coded citations according to their top-level domain so that if the domain was a country code top-level domain (such as ‘.za’), then we coded it according to the country (South Africa), but if it was using a generic top-level domain such as .com or.org, we looked up the whois data and entered the country for the administrative contact (since often the technical contact is the domain registration company often located in a different country). The results were interesting, but perhaps unsurprising. We found that the majority of publishers were from the US (at 56% of the sample), followed by the UK (at 13%) and then a long tail of countries including Australia, Germany, India, New Zealand, the Netherlands and France at either 2 or 3% of the sample.

Screen Shot 2014-07-30 at 12.42.37 PM

Geographic distribution of English Wikipedia sources, grouped by country and continent. Ref: ‘Getting to the Source: Where does Wikipedia get its information from?’ Ford, Musicant, Sen, Miller (2013).

Screen Shot 2014-07-17 at 5.28.50 PMThis was useful to some extent, but we also knew that we needed to extend this to capture more citations and to do this across particular types of article in order for it to be more meaningful. We were beginning to understand that local citations practices (local in the sense of the type of article and the language edition) dictated particular citation norms and that we needed to look at particular types of article in order to better understand what was happening in the dataset. This is a common problem besetting many ‘big data’ projects when the scale is too large to get at meaningful answers. It is this deeper understanding that we’re aiming at with our Wikipedia geography of citations research project. Instead of just a random sample of English Wikipedia citations, we’re going to be looking up citation geography for millions of articles across many different languages, but only for articles about places. We’re also going to be complementing the quantitative analysis with some deep dive qualitative analysis into citation practice within articles about places, and doing the analysis across many language versions, not just English. In the meantime, though, Dave has been working on the technical challenge of how to scale up location data for citations using the whois lookups as a starting point.

[hands over the conch to Dave…]

In order to try to capture the country associated with a particular citation, we thought that capturing information from whois databases might be instructive since every domain, when registered, has an administrative address which represents in at least some sense the location of the organization registering the domain. Though this information would not necessarily always tell us precisely where a cited source was located (when some website is merely hosting information produced elsewhere, for example), we felt like it would be a good place to start.

To that end, I set out to do an exhaustive database lookup by collecting the whois administrative country code associated with each English Wikipedia citation. For anyone reading this blog who is familiar with the structure of whois data, this would readily be recognized as exceedingly difficult to do without spending lots of time or money. However, these details were new to me, and it was a classic experience of me learning about something “the hard way.”

I soon realised how difficult it was going to be to obtain the data quickly. Whois data for a domain can be obtained from a whois server. This data is typically obtained interactively by running a whois client, which is most commonly either a command-line program or alternatively served through a whois client website. I found a Python library to make this easy if I already had the IP addressed I needed, and so I discovered, in initial benchmarking, that I could run about 1,000 IP-address-based whois queries an hour. That would make it exceedingly slow to look up the millions of citations in English Wikipedia, before even getting to other language versions. I later discovered that most whois servers limit the number of queries that you can make per day, and had I continued along this route, I undoubtedly would have been blocked from those servers for exceeding daily limits.

The team chatted, and we found what seemed to be some good options for doing bulk whois results. We found web pages of the Regional Internet Registry (RIR) ARIN, which has a system whereby researchers are able to request access to their entire database after filling out some forms. Apart from the red tape (the forms had to be mailed in by postal mail), this sounded great. I then discovered that ARIN and the other RIRs make the entire dump of the IP addresses and country codes that they allocate available publicly, via FTP. ‘Perfect!’ I thought. I downloaded this data, and decided that since I was already looking up the IP addresses associated with the Wikipedia citations before doing the whois queries, I could then look up those IP addresses in the bulk data available from the RIRs instead.

Now that I had a feasible plan, I then proceeded to write more code to lookup IP addresses for the domains in each citation. This was much faster, as domain-to-IP lookups are done locally, at our DNS server.  I could now do approximately 600 lookups a minute to get IP addresses, and then an effectively instant lookup for country code on the data I obtained from the RIRs. It was then pointed out to me, however, that this approach was flawed because of content distribution networks (CDNs), such as Akamai. Many large- and medium-sized companies use CDNs to mirror their websites, and when you do a lookup on domain to get IP address, you get the IP address of the CDN, not of the original site. ‘Ouch!’ This approach would not work…

I next considered going back to the full bulk datasets available from the RIRs. After filling out some forms, mailing them abroad, and filling out a variety of online support requests, I finally engaged in email conversations with some helpful folks at two of the RIRs who told me that they had no information on domains at all. The RIRs merely allocate ranges of IP address to domain registrars, and they are the ones who can actually map domain to IP. It turns out that the place to find the canonical IP address associated with a domain is precisely the same place as I would get the country code I wanted: the whois data.

Whois data isn’t centralized – not even in a few places. Every TLD essentially has its own canonical whois server, each one of which reports the data back in its own different natural-text format. Each one of those servers limits how much information you can get per day. When you issue a whois query yourself, at a particular whois server, it in turn passes the query along to other whois servers to get the right answer for you, which it passes back along.

There have been efforts to try to make this simpler. The software projects, ruby-whois and phpwhois implement a large number of parsers designed to cope with the outputs from all the different whois servers, but you still need to be able to get the data from them without being limited. Commercial providers will provide you bulk lookups at a cost – they must query what they can at whatever speed they can, and archive the results. But they are quite expensive. Robowhois, one of the more economical bulk providers, asks for $750 for 500,000 lookups. Furthermore, there is no particularly good way to validate the accuracy or completion of their mirrored databases.

It was finally proposed that maybe we could do this ourselves by the use of parallel processing, using multiple IP addresses ourselves so as to not get rate limited. I began looking into that possibility but it was only then that I realized that many of the whois providers don’t ever really use country codes at all in the results of a whois query. At the time I’m writing this, none of the following queries return a country code:

whois news.com.au

whois pravda.ru

whos gov.br

whois thestar.com.my

whois jordan.gov.jo

whois english.gov.cn

whois maroc.ma

So after all that, we’ve landed in the following place:

- Bulk querying whois databases is exceedingly time consuming or expensive, with challenges in getting access to servers blocked.

- Even if the above problems were solved, many TLDs don’t provide country code information on a whois lookup, which would make doing an exhaustive lookup pointless because it would unbalance the whole endeavor towards those domains where we could get country information.

- I’m a lot more knowledgeable than I was about how whois works.

So, after a long series of efforts, I found myself dramatically better educated about how whois works; and in much better shape to understand why obtaining whois data for all of the English Wikipedia citations is so challenging.



Labour and the Internet: Takeaways From the ISA World Congress of Sociology #2

In this series of blog posts, Vili Lehdonvirta writes about his thoughts and experiences at ISA 2014 in Yokohama, Japan. In this second part, he discusses professionals and knowledge workers in the context of labour market flexibilization, and draws implications to the study of online work. Read the first part.

poster-isa-congress2014In my previous post, I adopted labour market flexibilization as the overall frame through which I present my ISA thoughts and learnings. Sociologists tend to focus more on identifying the exploited and the downtrodden, but there are of course also workers who benefit from flexibilization. Most nonstandard work is associated with lower earnings over career and higher exposure to individual and social risks, but some workers are able to negotiate better deals for themselves in a more flexible market. Typical sources of such bargaining power are special skills and membership in a regulated profession. Both were discussed in the several interesting sessions dealing with the sociology of professions.

Winners, losers, and those who adapt

In a session on the cross-bordering and internationalization of knowledge workers, Gerbrand Tholen recapped how established professions, such as medicine and law, use labour market strategies aimed at establishing occupational monopolies over the provision of certain skills and competences. Such strategies are often based on a certifiable body of knowledge acquired through the education system. However, in newly emerging occupations, the link between university studies and job content is often tenuous and jobs diverse. Tholen noted that many newly emerging professions, such as software engineers, consequently enjoy little or no monopoly power over the provision of their skills to the marketplace. The power of formal qualifications also tends to be limited to the national contexts in which they are created. Other “employability” strategies may be available, however.

The other source of bargaining power, special skills, was also addressed indirectly in several presentations. In a session titled “Knowledge Workers: Processes of Hybridization, Marketization and Subjectivation”, Guido Cavalca and colleagues presented a study of medium- and high-skilled independent professionals in the Milan area. These included such professionals as webmasters, web designers, public relations consultants, and architects. Minglione and colleagues found that their subjects fell into two groups: those who felt that they were successful and autonomous independent professionals, or had a good change of becoming such; and those who felt that they were simply low-status employees in disguise, economically dependent on their employers but with no job security. These “hybrid workers” get the worst of both worlds: the lack of autonomy associated with employment, and the negative economic risk associated with entrepreneurship. Elsa Vivant reported similar findings from France.

5466679418_9fe22f6691_z_modThese divergent outcomes had to do at least partly with differing levels of skill and seniority, as well as work content. Cavalca’s study also included interesting findings on subjectivation and identity, which echoed some of my earlier findings on low-status online contractors’ identity talk: namely that regardless of how successful and independent they were in practice, contractors tended to laud their freedoms and draw favourable contrast to regular jobs, possibly as a way of maintaining a positive outlook despite sometimes tough circumstances.

In the same session as Cavalca, I presented my working paper on “Freedom and Agency in Low-Status Online Contract Work”, based on data I collected with Paul Mezier at the LSE over a year ago. Though originally scheduled as a distributed paper, I eventually got to give a full presentation on it. It basically deals with one aspect of how workers cope (or fail to) with flexible labour markets. This question was also touched on in some other presentations: how individuals and institutions not only win or lose, but also adapt to labour market flexibilization.

Transnational professionals or digital precariat?

What implications can we draw from the above to the study of online work and online labour markets? The following questions come to mind:

  1. Are we likely to see a professionalization of some types of transnational online work? In other words, are there strategies available to some classes of workers to establish monopoly power over the provision of certain skills to online labour markets?
  1. Do differences in skills explain divergent outcomes for workers on online labour markets, from successful entrepreneurship to exploitation? If not skills, then what?
  1. Does online work give rise to new occupational or class identities, and how do these identities intersect with existing national, social, and gender identities?
  1. How do individuals cope with and adapt to online work, and shape their everyday lives around it? What implications do these adaptations have to family, community, and society?

In the third and final part of my ISA 2014 report, I look at organized labour and global production networks, and draw additional implications to the study of online work.


AAG 2015 CFP – From Online Sweat Shops to Silicon Savannahs: Geographies of Production in Digital Economies of Low-Income Countries


From Online Sweat Shops to Silicon Savannahs: 
Geographies of Production in Digital Economies of Low-Income Countries


AAG Annual Meeting, Chicago, April 21-25, 2015
(sponsored by the Development Geography Specialty Group)

Mark Graham, Nicolas Friederici, and Isis Hjorth University of Oxford

Throughout the early 21st century, Internet and mobile phone access in developing countries has skyrocketed, and today the majority of people on the planet are connected through information and communication technologies (ICTs). Yet, while basic ICT access is increasingly level across income groups and geographies, production in the global digital economy is still, and maybe increasingly, dominated by incumbent multinational technology corporations or fast-scaling web startups. These businesses tend to roll out their products (with some local adaptation) across the globe, but maintain their coordinating and creative activities in places like Silicon Valley, Tel Aviv, or London, exploiting both agglomeration and dispersion economies in digital production (Malecki & Moriset, 2007; Moriset & Malecki, 2009).

How does digital production in low-income countries fare in the face of this dominance? Policymakers and the private sector in several low-income countries (especially in Sub-Saharan Africa) have set out to transform their economies through ICTs, explicitly emphasizing local digital production. Two sectors that are often seen as promising are (1) low-skill/cost-competition, such as business process outsourcing and digital microwork, and (2) high-skill/entrepreneurial innovation, such as startups developing and commercializing mobile and online applications.

However, what are the concrete and realistic potentials and possibilities for low-income countries to become important hubs for digital production? What are palpable economic outcomes of Kenya’s status as the “Silicon Savannah” or Lagos as the “Silicon Lagoon,” and who are the winners and losers of local ICT entrepreneurship and innovation? Do ICTs really deliver economic inclusion and employment to remote geographies and low-income groups, or are we witnessing the rise of online sweatshops that further enhance exploitation of vulnerable populations?

This session will explore these themes, encouraging contributions from a variety of perspectives. We invite authors to consider digital production in low-income/developing countries through lenses such as:

  • Empirical or theoretical perspectives on digital production and its (uneven) geographies
  • Discourse around digital production and its promises and risks
  • Distributions of value creation and extraction across actor groups (winners/losers)
  • Tensions of scaling versus local adaptation in digital production, in application to geography and inclusion/exclusion effects
  • Uneven production geographies within countries, in particular, differences and divides between rural/peri-urban/urban clusters
  • Socio-demographic analyses of economic actors engaging in digital production
  • Case studies of low-skill/cost-competition digital production (e.g., business process outsourcing, microwork, etc.)
  • Case studies of high-skill/entrepreneurial innovation in digital production (e.g., mobile/online applications startups, technology innovation hubs)
  • Analyses and recommendations for local and international policy pertaining to digital production

To be considered for the session, please send your abstract of 250 words or fewer, to: mark.graham@oii.ox.ac.uk, nicolas.friederici@oii.ox.ac.uk, and isis.hjorth@oii.ox.ac.uk

The deadline for receipt of abstracts is October 1 2014. Notification of acceptance will be before October 7. All accepted papers will then need to register for the AAG conference at aag.org. Accepted papers will be considered for a special issue or edited volume edited by the organizers.


Malecki, E. J., & Moriset, B. (2007). The paradox of a “double-edged” geography: local ecosystems of the digital economy. In The Digital Economy: Business Organization, Production Processes and Regional Developments (pp. 174–198). New York, NY: Routledge.
Moriset, B., & Malecki, E. J. (2009). Organization versus Space: The Paradoxical Geographies of the Digital Economy. Geography Compass, 3(1), 256–274.


Final Project Report: Promises of Fibre-Optic Broadband in the Kenyan Tourism and Tea Sectors

My colleagues Professor Timothy Waema and Charles Katua at the University of Nairobi have recently finished a report describing and summarising some of their research into the effects of changing connectivity in the Kenyan tea and tourism sectors.

You can find the full report below, and we would welcome any feedback that you might have:

Waema, T. and Katua, C. 2014. The Promises of Fibre-Optic Broadband in Tourism and Tea Sectors: A Pipeline for Economic Development in East Africa.

The report comes out of our larger project looking at ‘development’ and broadband internet access in East Africa. In the next few months, Chris Foster, Nicolas Friederici, and I will also be releasing reports looking at changing connectivity in the tea and tourism sectors of Rwanda, and the implications of changing connectivity on Kenya and Rwanda’s Business Process Outsourcing sector.