Sketching a model of data ownership

7 min readMar 1, 2018

UPDATE 28/04/19:

This was a very brief thought exercise, and in hindsight there are a few other aspects to tackle. Please now consider this just a live doc of notes, thoughts and scribbles, that I’ll add to, amend and maybe straighten out one day :)
I hadn’t heard of personal data stores when I wrote this last year, but they’re growing in importance. Irina Bolychevsky, from Redecentralise, wrote this piece on them. Tim Berners-Lee’s recently announced Solid venture is one such project.
Rashida Richardson, from the AI Now Institute, has also raised some important issues with individual ownership of data (e.g. where is the boundary for ownership of, say, medical records, for things like hereditary conditions?). I’m not sure I pressed this point sufficiently originally, and it suggests that even if data is collected/stored on a personal basis (see above), the content/intentionality of that data may reference others, complicating matters. This could suggest the need to collectivise ownership, if ownership is itself even suitable…
To that last point, Will Davies’ point that data is really just ‘our impressions’ on someone else’s ‘infrastructure’ captures the ongoing difficulty of assessing how (much) value should flow to the infrastructure providers and the data factory itself. Of course, you could also construe ‘impressions’ as the labour that keeps that industrial complex flowing. ¯\_(ツ)_/¯
I’m also sceptical of putting a monetary value on data. Beyond almost certainly protecting incumbents, what other kind of behaviour/companies/activity might that (dis)incentivise? Indeed, is payment even a necessary condition of promoting public value? Could that be achieved just by making high + non-exploitative governance/usage standards a condition of access?
Finally, is control more or as important than ownership? In a few years privacy-app Jumbo will, surely, just be a ‘data control all in one place’ app. The ODI’s recent work on data trusts sits somewhere in between this and Solid. Chuck in a self-sovereign ID and maybe we’re getting there.
Anyway, just some more caveats to an already over-caveated piece :)

I was recently asked what ‘owning your data’ might look like in practice. Here, I try to think through the problem and sketch out a possible way forward¹.

(Major Caveat: For now this is only a very brief thought exercise, with, admittedly, many conceptual and technical holes. I’m trying to write more as a way of exploring issues I’m interested in and to contribute to the wider discussion, but I certainly don’t have all the answers and there are likely countless better ways to approach this! Do share any suggestions!)

(Minor Caveat: I’ve mainly restricted this analysis to data as used for advertising, primarily in the context of social media. For a fuller account of data ownership, including things like health and financial data, I’m aware that further conceptual and technical work is needed.)

The challenge

With the rise of a small number of big tech companies — and governments using technology to watch their citizens — many people now believe technology only centralizes power rather than decentralizes it. — Mark Zuckerberg, Jan 2018

In this excellent New Scientist article [paywall], hal hodson outlines how the convenience of technology has lured consumers into giving up data which tech companies then aggregate and process into highly personalised advertising targets. Evgeny Morozov has gone even further, arguing that this is nothing more than a transaction: data for service, service for data.

This matters because we don’t know what companies know about us, what assumptions they’ve made, and how they’re handling that information. So how do we know this ‘transaction’ is such a good deal?

We are in an age where your data stream is made up not just of your interactions with a screen (including surreptitious ad-tracking) but also your card purchases, travelcard scans and a growing number of ‘smart’ devices that track your behaviour in the home.

This picture is so complex that it’s impossible to have a full sense of what it really looks like, and those who do — tech companies that can process it in aggregate — wield evermore power as a result.

When they commercialise that insight, it’s easy to see why some apply the famous “if you’re not paying, you’re the product” phrase here. On one level, there is value being created that we as consumers have contributed to yet receive no direct benefit from, beyond the convenience of a given service. On a deeper level, we might ask whether this is even an ethical business practice.

The proposal

One of the responses to this state of affairs is for consumers to ‘own their data’. It pays to consider what this means for three main areas of concern:

Data collection: Any data collected through the devices and services you use would need to be sent directly to a place of your choosing, rather than that service or device’s own servers.
Data storage/management: For reasons of privacy and security, you may decide to run your own server and store a raw data stream. Or, for convenience, you might outsource this to a company who will turn your raw data into meaningful insights for you (based on their ability to aggregate their users’ data).
Data-based insights: Whether you have a company that provides insights at this stage, or you sell your raw data to other services who produce their own insights, the apparent benefit of this model is that you at least have control over the process.

So, in practice, Facebook would have to ask permission to access my data stream, which would be stored elsewhere (i.e. not on Facebook’s servers) and which I would have full access to and control over. For example, maybe I’m happy to be advertised clothing or electronics, but nothing else. In theory, I can allow Facebook to access the data that are relevant to those domains, without also giving them access to everything else I can see on my dashboard (UI details TBC)².

How can we make this picture a little more detailed?

Firstly, we can borrow from the Open Data Institute’s work on open data standards, which they argue would “help to change markets, create open ecosystems and implement policy objectives”. Any user interface that would seek to compile and process all known data points about an individual would first have to be able to read data from a wide range of sources, most likely using an API, and common, open standards would encourage portability between different services.

To provide a more detailed picture still, we must also consider Benedict Evans’ important critique of the data ownership concept as a whole:

As he outlines, my data stream on its own is simply not meaningful to anyone. But what we’re really after is transparent, accountable and reliable use of data, and this is where the obscurity of proprietary, closed systems can fall short. If your main criticism of personal data ownership is a lack of commercial value, you’re missing the point.

Yes we need to investigate ways of harnessing the value of data, but we must also respect what’s important socially and ethically.

Building on this idea, we can introduce a role for intermediaries between consumers (data-producers) and companies (data-users). These brokers could represent thousands or even millions of individuals, leveraging their collective value to negotiate better terms, including the possibility of extracting a usage fee, with organisations.

With more time dedicated to thinking this through (than I will give here), this could perhaps be a workable solution. However, at first glance, we seem only to have shifted the problem of data storage and management onto a new set of companies by creating a marketplace in which brokers may be incentivised to play both sides. This should be avoided.

As a result, an alternative option could be a form of data management ‘union’, in which consumers use their collective bargaining power to negotiate better terms while retaining ownership, such that they are not susceptible to the same market-based incentives as brokers.

Perhaps the ideal is something even more radical. As the data supply will increase over time, its value is increasingly extrinsic. As I’ve outlined, the real value is in the means of analysis. So, rather than a data union bargaining with tech companies…who in turn produce targeting insights…which they monetise with advertising, is there potential for a cooperative (social) platform to produce their own insights, working directly with advertisers?

The network effects of existing platforms are certainly a barrier here but, in conjunction with open standards, these walled-off communities would lose some of that power, allowing for new competitors³.

A final option suggests that governments could act as brokers, with citizens retaining ownership over their data. This could open up revenue-raising opportunities for the Treasury (through fees for data use) who could utilise whole sums more effectively than each citizen receiving a tiny dividend.

Still, there are counterpoints to this, too. Any detailed work on this would have to take account of privacy concerns — centralising data in government is no recipe for success — and at no stage should governments have unfettered access themselves to data sets (due to both privacy concerns and since they too are customers).

In sum, this suggests that if we can promote socially-minded, market-based solutions then, since that seems to strike a good balance, that should be the aim. However, if alternative models are required we should not hesitate to explore them in more detail.

Footnotes:

Thanks to Lorna Pittaway for her advice.
Facebook’s insights are often far more complex than this, such as their ability to spot potentially suicidal users. (My model is admittedly, and deliberately, extremely simplified. Still, it’s worth noting that any comprehensive account would need to process and communicate (and report?) various nonstandard data points such as this.)
Mozilla is a good example of how a mission-driven organisation could also compete when it comes to recruitment.
Update 03/04: clarified language around data portability/APIs.
Update 28/04/19: added further points.

Sketching a model of data ownership

The challenge

The proposal

Footnotes:

Written by Andrew Bennett