The Problem of Data

On September 4th 2020, WEDF's John Marshall gave an introductory talk at London Tech Week's Ethical Data Festival entitled "The Problem of Data". A non-technical survey, this is a direct transcription of a talk that explored the scope and nature of some of the issues around data. Implicating the entire knowledge economy, and embracing ‘fake news,’ media narratives, and the impact of social media quite as naturally as considerations of bias in data analytics or AI, the issues are non-obvious and may in fact be the most important of our age.

John Marshall

Broadcast on September 4th 2020

This is a provisional transcript. Copy may not be in its final form.

"The Problem of Data"

JDM: Hi, my name is John Marshall. I'm Director of the World Ethical Data Foundation and CEO of the World Ethical Data Forum. I'd like to talk just very briefly about a couple of the peculiar qualities that data has, and of the peculiar qualities of the ways we think about data, and some of the ways in which we can begin to reorient our thinking about the ethics that — or rather the questions about ethics that — are arising in consequence of these questions about data.

One thing that perhaps should be said in advance is that we do tend to, and we ought not to, conform and dote on technological progress at any price. And that this dotage is definitely an infirmity when it comes at the cost of civil liberties – unambiguously understood. You know, we have to understand the subtlety and the nuance here, but we also have to realize that we are not floating in a void. There is a lot of history, traditionally, culturally behind us, and it would be wise to remember that we may want to at least entertain the idea of how we might explain to previous generations, that despite all the literary and cinematic warnings, despite all the philosophy, and critics, and historical scholarship, that we may well be flinging ourselves into a state of incredible vulnerability. And it’s not even because the considerations are so wonderfully subtle that they’re beyond the mainstay of the populace. I mean, they are subtle as the fleets of surveillance dirigibles in some instances across the US. So it would be wise to a least be capable of beginning to heed the warnings that China is currently furnishing us with, and Russia has done, and the history of the West also, with an embarrassment of riches...

Now that said, it doesn't change the fact that we are hopeless at deciding how we wish to live. We’re very good at working out what we find abhorrent and awful and avoidable, and what we find distasteful; we are not so good at the positive exemplar, morally, or in terms of the kinds of state or civilisation we wish to realise. That's nothing to do with us or our failings, that is just, I think, a condition of human life. I mean, Plato didn't do any better when he wrote the Republic and dreamt up the Kalipolis. I mean he famously says... infamously says... that even the blueprint of this perfect state was going to happen through, you know, divine intervention or else sheer luck... A bit of modesty therefore may be in order, and that we recognise we may want not to have to hope for such divine intervention or hope that it's going to happen due to luck or happenstance; rather we may wish to muddle along and try to work out the right way of establishing a discussion that allows us at least a chance of composing our differences with our fellows… and with other cultures and traditions.

Data is one of those areas now which I think allows us to recognize that, when dealing with the sorts of technology that have become such a part of the fabric of life, we can begin to make, you know, distinctions with no difference practically... We speak of data but what we mean, in essence, is something not distinct from the stuff of the life we're living on a daily basis: whether it's the economy and it's being a digital economy, or it's a democracy and it's having to do with informational democracy and information flow, and to do with he ways that we vote, or it's to do even with military infrastructure, and so this aspect of the state as well, and national security which now, again, is to do with a nation and national security that seems to have no boundaries, given the uncontainability of data other than by cryptographic means...

Despite all of those complications, however, we are able to recognize that there's a great bifurcation. The positives of data for life are potentially so great that it might be, unambiguously, maybe the only unambiguous moral failing not to pursue them and their realization. And yet the negatives are so severe as well that we can't help but recognize that civil liberties can be trounced if we don't, I think, rub our eyes and ears and pay attention to both what is the case and what the implications may well be. A bit of historic context might come in handy from time to time in this regard. When we've tended to allow people to concentrate power into their own hands — largely unelected hands — that can be quite dangerous. These are serious questions anyway... so let's just begin to look at data generally, and how in terms of the technical sorts of data we are using computers to do all sorts of things: we can simulate the weather, we can analyse DNA sequences, population growth; we look at financial markets; we can model the beginning and the expansion and even the end of the universe, dependent upon those models; we look at chaotic systems; we can use data to test for and diagnose illness, potentially saving millions of lives; we can and have been modelling human cognition and behaviour; quantum states; we've been looking at the interactions of individuals, you know, with systems online, and with one another, and even to predict the results of elections. All of this is the kind of impartial analytic non-interventionist observation of systems and data sets designed to understand how they evolve according to their own principles that has proven invaluable to science and predictive analytics.

However, in addition to such structuring and analysis of data in search of new insights, also exists an ever more lucrative market that recruits AI and the potential of data in order to help shape and determine these very behaviours. A recent and notorious case in point is that of Cambridge Analytica, now something everybody knows after The Great Hack documentary, but it's worth just remembering, as almost a symbol, how their psychographics, or rather their work with psychographics meant that they were able to identify individuals most susceptible to political persuasion, capable of being drawn over into another way of thinking or voting or otherwise thereby confirmed in their beliefs, and how they then used this analysis to feed content intended to shape attitudes, opinions, and ultimately the voting behaviour of these individuals according to the interests of their clients. Now, manipulation of data has long been exploited for the achievement of financial and political ends, it's not at all new; gerrymandering and malapportionment of the electorate are supreme examples of this. Yet, so goes the claim ...although it is worth remembering Cory Doctorow's essay on our nervousness about mind control and about Manchurianism: it's a timely corrective actually, that essay; yet, so goes the argument, never before has a gerrymandering of an individual's personal values and beliefs been so easily accomplishable as they currently are. And with already well over half the world's entire population regularly able to access the internet and so subject to such persuasive technologies, and with more and more of our life coming down to the ways in which the AI which govern those systems involve us in their processes to encourage and manipulate engagement, there may well be no area in today's world with quite so much at stake, given how it implicates every institution of importance, from the markets already mentioned, to the democratic process simpliciter, and the people thereby voted into power and able to make decisions about war, and COVID-19 policy, and anything else. So at least taking seriously the current state of affairs around data, data harvesting and use, and the technologies which are developing, and how great a leverage and influence over the future the control of these domains is, we can begin to ask a few I think rather interesting questions about the ambiguity of certain of the concepts involved.

So a couple of complicating factors, and these are rather interesting… The scope and the nature of the issues around data are not at all immediately obvious. Data isn't only a technical problem… you know, how data is generated, how it's optimised, how it's how it's put to work... but importantly there's also the semiotic concern and the question of how data comes to have meaning at all and be useful; so that whether the term data is used in a general way, to describe gathered bodies of facts, the stuff of Wikipedia or science articles, or it's used more technically to designate the materials of statistics or data analytics, whatever the case, a datum is only ever a unit of potential information until it's placed in an interpretative framework or a theory, some sort of context that grounds it and which gives it significance... Now that semiotic fact is very very trivial; it's not going to shock anybody; but the consequences of it are far reaching and they're worth reflecting on. Because the frameworks, which interpret and ground those data are of course not restricted to science or mathematics or statistics, but rightly understood can be seen to form hierarchies of shifting interdependent and often competing narratives and priorities that run the whole gamut, from the stuff of, you know, from the dialectical, in pursuit of truth, to the realpolitik, which take in even the narratives propounded by media and government, which at their worst dispense with, or downgrade the truthfulness of the claims made and instead simply seek advantage from them. So although extremely valuable in their own right, to embrace the technical considerations of data alone, exclusively, would thereby be to miss a significant point. Precisely because of this dependency on their frameworks, for all a fact and the interpretative scheme that grounds it are conceptually distinguishable they're also conceptually and practically indissociable. The framework and the context are imminent in the claim, or the statement, or the fact, so while we can think them differently we can't actually pull them apart. What that means, in essence, is that the problem of data implicates the entire knowledge economy, and it embraces fake news, media narratives, and the impact of social media quite as naturally as the considerations of bias in data analytics and AI.

It is for this reason actually that we spend so much time at WEDF talking to theorists and journalists and politicians as well as those directly implicated or directly involved in the creation of data and data technologies. It's precisely so that we can address the question as responsibly and as richly as possible. Now what is fascinating also is that in addition to the oddness around the idea of data is the oddness around the complacency with which we treat of ethics. I mean ethics boards are springing up all around the place. Again, notoriously, the one at Google which lasted just a few days until it became a public issue, until it became something that… it isn't even a matter of diminishing returns, but rather was a cynical marketing ploy that didn't work out particularly well it would appear. But for all the questions around the ways that we ought to live seem to be tricky, there are a great many boards of that sort keeping track of the answers being given, as if there were a sort of domain or an area of expertise relevant to ethical judgments. One of the profound problems is that questions like this are seldom straightforward. Ethical questions generally are seldom straightforward, but especially not so in an area the scope and implications of which are unclear, it being new, and for which our intuitions, our moral intuitions as well as our intellectual intuitions, are potentially ill-suited. We're hopeless, intellectually, at thinking, for example, data aggregation... it's very hard to keep all of the relevant things in mind at a sufficient level of complexity and dynamicism to really do anything useful with it. The Snowden disclosures are a case in point, where lots of information was given us about the ways in which metadata are used by surveillance institutes. We see the fact of it, we recognise it, but we don't easily understand, I think, how deep that goes. It's hard to keep this in mind.

Anyway, the point is that because of the curious lack of fit between our intuitions and the ethical questions that seem to arise, and the nature, in essence, of technology and data itself, this may be one of the times that we ought to try to avoid demagoguery. So if there's a right answer actually, potentially the answer is simply to establish an on-going discussion that strives to identify and minimise the chances that we make the kinds of potentially catastrophic errors that definitive answers can incline us to, and to maximize the chances of delivering positive goods, the kinds of goods that these technologies so amply promise. So again, what I would counsel is a bit of modesty therefore, in light of these questions; because, to repeat, issues as unclear, significant, and as urgent as these ones, those around knowledge technologies generally… and I don't just mean data and AI, I also mean the printing press, free speech… these aren't I think a matter for demagoguery but of balanced and collaborative democratic effort by people of all perspectives and traditions and even cultures, given the way that the consequences of the assimilation of these technologies implicate us all in I believe ways which are now almost a foregone conclusion.

So what I would like to veer into as a way of summarising, I think, the condition... that we have all of these peculiar... we have a lack of fit between our intuitions and the ways that we're attempting to understand new technologies... that our ethical intuitions may well be a poor fit for this, and we need to start asking those questions well... is to insist that we don't get distracted, that we don't have our attention diverted by the technical details; that we have to remember the matters of principle which are behind that, I mean which are fundamental to the issues. This requires a bit of courage actually because it has us have to for the first time — and we're hopeless, historically we've been hopeless, at determining answers to these questions — we have to begin to weigh up what sort of future we wish to live, and what kind of life we wish to live, not just individually, but collectively; how do we wish society or civilisation to be... not least because this is no longer just a matter of, you know, Britain being Britain, surrounded by water, or America unto itself, or China unto itself, because there is now a peculiar community globally, and this is going to affect the sorts of compromises we're forced to make about the ways that we're forging a future, which again will contain and implicate everybody potentially. So we are living I think in a different world now and there has to be integrity in the process of this discussion. But it doesn't change the fact that we have to step back and take on responsibility for weighing those things ourselves. We can't resign ourselves to the inevitability of history realising itself in a desirable way. As I've said, that has tended never to work out very well historically. It's also the case that this is bystander effect raised to the nth degree, and about such an important question it is not even on the table. Seneca has a beautiful expression where he talks about how unless we know which port we're sailing to no wind is helpful. That is more or less how we stand in relation to the technology question, the data question. Where we have on the one hand negatives, which are so serious, and positives which is so very real that the moral failure, to repeat, may well be not to pursue their possibility, and so what we want to do is to not treat this like a Scylla and Charybdis, because in fact it's not; it's rather that we have to steer it so that we get the one rather than the other, and we can only do that if we have a clear idea of both. We can get a good sense of what we do not wish to dispense with, which parts of our civilization, which parts of our culture, do we wish not to let go of because they're essential, crucial, even structurally, to the right and healthy functioning of a democracy for example, and which positives do we wish to pursue. We have to forge for ourselves moral or political exemplars that allow us to direct our course. Not an easy matter... Plato failed, Marx failed, and it’s been a rather hellish history in the... Hegel failed ...all of history is a bit of a mess as far as political theory goes. However, we nonetheless are at a place where we can't but make the choice, and if we don't then we're choosing simply to resign ourselves to better heads than our own as we would hope; and my feeling is, given that many of the decisions being taken about the issues which are determining the shape of the future are either obscurely or market-motivated, we would be I think mistaken.

Tags:

Data

Data Ethics

ethics

technology