A few days ago Prateek Jain, Pascal Hitzler, Krzysztof Janowicz, Chitra Venkatramani from Knoesis published a short writeup “There is no money in Linked Data”. And started a corresponding discussion on the W3C Semantic Web mailinglist. In this post I want to summarize the short discussion I had on this topic with Pascal Hitzler, since my impression is that we often do not see the historic analogy between the developments regarding open-source software a few years ago and open data now.
Pascal and colleagues argue, that “… using these (Linked Open) datasets in realistic settings is not always easy. Surprisingly, in many cases the underlying issues are not technical but legal barriers erected by the LD data publishers.” Pascal concluded in the mailinglist thread:
Pascal: The notion of (Linked Open Data) simply is rather unclear. “Linked Open Data must have an open licence” is – in the light of the analysis in the paper – almost meaningless, as “openness” of licences is not a boolean. There are many shades to it, and most of these shades do not allow readily for commercialization.
I somewhat disagree with that statement: there is (and should be) a clear (boolean) definition what open means: the Open Definition.
The Open Definition precisely defines the requirements for a license in order to be called open. Allowing remixing and republishing, availability of data in bulk and non-discriminatory licensing allowing commercial reuse are core requirements of the open definition. Open Data is not cardinally different from other “open” domains, e.g.open source software and for open source software there exists also a clear definition (overseen by OSI), which is meanwhile widely enforced. I’m a big fan of both — Linked Data as a data integration paradigm within and between organizations AND Linked Open Data as a way to share data and knowledge openly on the Web. With the Open Definition we have a clear way to distinguish between the two.
Pascal: Attribution or share-alike can already be showstoppers, and for some context can render LOD/LD non-reusable – in which case the term “open” appears to be rather misleading.
And he is of course right in that: requirements like “attribution” and “share-alike” are showstoppers for some business models, but definitely not for data-driven businesses in general.
Let’s always look at the open-source analog (they are a few years ahead of us): Most open-source licenses require attribution and quite some prominent ones (such as GPL) also sharing-alike and still open-source software is big business (look at Red Hat, the 1Bn open-source business IBM makes every year with Linux alone or all the OS software used and produced by Internet and Web giants).
The share-alike requirement actually has two sides, it can prevent some business from reusing the data, but also gives the original data publisher a competitive advantage, since he can dual license his data commercially without the share-alike requirement, so I think it is at least as much a business facilitator as it is a showstopper.
Pascal: Yes. But what you say confirms my argument that “open” is not so boolean in meaning
I think there are not many things of practical relevance, which are purely boolean. The advancement of science in the last century (beginning with Einsteins discovery of the theory of relativity) has shown, that almost every paradigm is only valid in a certain context.
Nevertheless, the border and implications of being “open” (according to the open definition) are from my POV pretty clear: If you can easily get, use, change/mix, and republish something (even commercially) its “open”. Openness does, however, not require the original publisher to give up his right to be acknowledged as such (attribution).
If you want *ultimate freedom* without any limitations whatsoever, than *public domain* (German gemeinfrei) is what you are looking for. And I know there is an ongoing debate about the balancing of freedom and reserving some rights. Anyway, as the open-source example shows us (where exactly the same debate exists/existed), that this does not prevent the emergence of a sustainable business ecosystem.
Side discussion on data licensing:
A few days ago I attended a talk by a German lawyer in the Federal Ministry of Economy about data licensing and he said that if you publish your data on the Web without access control, it is (at least in Germany) not secured by any IPR and everyone can (without asking the publisher) use the data, republish it and do whatever with it as he pleases. If this is really true, at least for all Germans all data published as Linked Data on the Web without any license would be Open Data too. Denny Vrandečić then responded:
“If this was true” (that data published in Germany on the Web without access control is free to be used as wished) “then no license would be able to take away your rights to do so. Per definition, a license is meant to grant you rights, not to restrict them.”
While Pascal pointed out:
When does this German law apply, given that the Web doesn’t really have borders?
If I’m a German (business) and use data published by a German (business) I’m always fine. In other cases at least nobody can sue me here for using data published on the Web, but you are right, I might not be able to travel everywhere anymore
But joking aside: Google build a huge business around using texts from the Web, what’s the problem with using data from the Web? We also should not always draw horror scenarios…