Building out legal permissions on the semantic web

So no surprise I’ve been thinking more and more about semantic web technologies and the law, given my recent trips and talks on open data. This represents some of my early-stage thinking about how copyright plays into the coming framework.

For those not familiar with this area, my big picture layman’s summary of the semantic web / linked data: Make more stuff machine readable so that we can do smarter and better things with machines.

One of the strands of developing semantic web technology deals with building out copyright (and other IP) permissions into the framework. You can find out what the rights cover what, and where to go to get copyright permissions, etc, generally through adding metadata (data about data).

Going back to my lay interpretation, this means “making copyright permissions machine readable so that machines can do smarter and better stuff when dealing with copyright permissions”.

Creative Commons for example has started this through giving each of its licenses a set of machine readable code and through developing standards around these machine readable expressions of their licenses such as ccREL. Incidentally they give their licenses out in three versions: human readable (a summary), lawyer readable (the actual license) and machine readable (the extra stuff in the copy and paste code they provide).

Incidentally, at ISWC, there was a really interesting presentation on a paper (PDF) on looking at attribution, Creative Commons, and Flickr within a semantic web framework and ways to make compliant attribution in CC licenses easier.

I’m not qualified to go into deep detail on the technical side of implementing rights into the semantic web, so I’ll leave that to others. I’m thinking more about the big picture on how you build out such a framework for copyright and what approach you take.

Where do you start when trying to describe copyright licenses for the web?

I see (and have seen presented by others) three options:

  • Option 1. Start with copyright law and write out permissions based on each of the individual rights bundled up with copyright.
  • Option 2. Start with what users may do with a work and then whether you grant them permission.
  • Option 3. Start with current copyright licensing practice and how copyright gets bundled and used by licensors currently.

I see options 2 or 3 as the only real way to go. Starting with copyright law (Option 1), and expressing the rights – such as simply “distribution” – paints with entirely too broad a brush. To express a permission in terms of “distribution” misses the fine grained control that copyright gives rightsholders.

For example, industry practice (say in the movie industry) often break down the broad distribution right into very fine grained levels, such as:

  • by geographic region – North America market versus European market
  • by media type – theatrical vs satellite rights vs DVD rights
  • by time – licenses last for set number of years

Option 1 – starting with copyright law – also has a further wrinkle: What copyright law do you use? Copyright consists of national rights harmonised by international treaties. The Berne Convention (or rather, Berne via TRIPs) sets a floor and not a ceiling, and member states have fairly wide variation in how the implement and enforce it. Using Berne as a “copyright law for the global internet” may be tempting but is inaccurate – 171 countries on the internet mean 171 different sets of copyright law. One specific right such as “distribution” means in one place may mean something different somewhere else, and you have to find ways to express both of those differences (though that is not to say that this can’t be done or that semantic web technologies aren’t addressing the problem of different definitions).

Options 2 and 3 admittedly aren’t too far apart from each other. Mainly I see this as a difference in tone rather than a deep divide:

  • Option 2 starts with the hypothetical user and asks what could he or she possibly do with the work, versus
  • Option 3 starts with industry practice in licensing and asks how do licensors typically license their works.

I think Option 3 is probably the more practical of the two, as while copyright law may allow super fine grained control at times, the key is what level of control most rightsholders usually exercise and how they bundle those rights. Mechanical rights, for example, are the name given by the industry to the right to reproduce and distribute a music CD, but aren’t a single right granted by statute.

Either way, more fine grained expressions of copyright will get built into the next generation of web technologies – indeed this has already started with ccREL and others. Starting with existing copyright practice and building out from there seems to make the most sense to me.