Can Data Scientists Avoid the Fate of Icarus?

Bill Thompson
13 min readOct 1, 2019

By Gabriel Straub and Bill Thompson

Landscape with the Fall of Icarus , painting

The enthusiasm with which data scientists have embraced data-driven business models, along with a lack of concern for the needs of users and society, means that we are flying very close to the sun in terms of the potential negative consequences of the technologies we are developing. It’s time to look more carefully at the real impact of the choices we make, before we end up crashing into the sea: our emerging profession is not yet so firmly established that we cannot consider rethinking its boundaries, obligations, regulation — and perhaps even its name.

https://xkcd.com/1838/

Beyond Data Science

The widespread and increasing use of ‘algorithmic’ decision making in organisations of every size places an enormous responsibility on those whose work feeds into the emerging architectures of control that shape our world, and we who develop and deploy them must accept the role they play. It is not enough for us to claim objectivity or to use the terminology of science in an attempt to insulate us from the subjective, messy world of company demands and political realities, and we cannot masquerade as mere engineers who create technologies to be deployed by others.

When British Prime Minister Winston Churchill said ‘we shape our buildings, and afterwards our buildings shape us’ he was referring to the way the layout of the House of Commons influenced the two-party political system. It is equally true to say that first we shape our algorithms, and afterwards our algorithms shape us. If the way modern society function and the choices made by companies, citizens and the state are increasingly influenced by the output of systems designed by data scientists, then we must acknowledge this and reflect on what it means for our emerging profession and those practising it.

In a recent conversation with engineers at a large technology company it became clear that some of them had a problematic attitude to potential bias in the data sets they were dealing with, and that they did not see bias as their responsibility. They argued that bias was already there in the data and that whatever outcomes came from its use for training or decision-making were not their fault. They saw themselves as neutral processors, with no ethical or moral responsibilities provided their systems worked ‘as advertised’.

It is of course in advertising that we see many of these contradictions most clearly illustrated. Anyone looking at the emerging data landscape will observe that a great deal of our energy and effort is expended serving the attention economy, trying to maximise the share of that most precious commodity, human attention. We gather, process and use any data we can get our hands on, in order to help the companies, corporations, charities, and governments that we serve get more attention, in a competition that becomes more and more intense over time, and one in which the resource we struggle over is very hard to create[1]. As a result we seem to spend too little time thinking about customers and audiences as people, instead trying to optimise customer behaviour to serve perceived business interests.

One outcome of that is a range of products and services filled with dark patterns designed to shape user behaviour, and user interfaces intended to produce addiction, like the ways airline sites insist you reject multiple offers of cars, hotels, insurance and even luggage before you can pay for your ticket, or the way unsubscribe options are greyed out and in small fonts. Machine learning allows us to create a different temptation for every person, optimised to their particular interests, with the main aim being to advertise the platform that keeps you hooked and allows you to be exposed to adverts paid for by others. This tendency towards encouraging overuse is not a bug, but a carefully designed feature supported by the best analyses that data science can offer.

As well as the easily observable consequences in terms of how people’s behaviour is influenced, there are wider issues to consider, most notably when it comes to the use of recommender systems in the news ecosystem. The products and algorithms that have been developed with the active participation of data scientists have led directly to the current crisis around misinformation (“fake news”), fake profiles and opinion shaping because all of these have been massively exacerbated by well-funded actors who have learned how to game the system to generate profile, clicks and attention for their products[2].

If the goal of a technology is to attract and retain someone’s attention by presenting them with material that will keep them interested, and if we are as happy to exploit negative emotions like fear, distrust, anger, and even disgust, then we are going to take audiences on a journey from established media to manufactured news to conspiracy theory, and on to downright horror, as with YouTube Kids willingness to promote bizarre Peppa Pig videos to young people The tendency of YouTube to take a viewer on a journey towards more extreme political content has also been well-observed[3].

Both these examples emerge from the rigorous application of the principles of data science without consideration of individual or societal context.

There is a parallel with the Greek myth of Icarus, who famously flew too close to the sun and melted his wings. Icarus was flying with his father Daedalus in order to escape the island of Crete[4]. Daedalus had designed and built a palace for King Minos, with a maze beneath it in which the half-man, half-bull minotaur was imprisoned, but later fell out with Minos and was trapped there. Constructing wings for himself and his son Icarus from wood, wax and feathers, father and son made their escape by flight but Icarus, overjoyed at being able to fly like a bird, flew too close to the sun causing the wax to melt, and he fell into the sea.

Could it be that as data scientists, as enamoured of machine learning as Icarus was with flight, we are so close to the intense heat of today’s online business models that we too are in danger of seeing our wings melt, of plunging into the sea? And just as it was Icarus’ success in taking to the air that led him to disaster, could this be a consequence of our success at locating data science at the centre of an approach that puts optimising customer behaviour for commercial interests ahead of any concern for the welfare of the people involved?

The sun certainly burns bright: we cannot dispute the many successes of machine learning over the last five years. Data science techniques have driven large efficiencies and created new businesses through the ability to match billions of search queries to billions of documents. Companies can connect millions of drivers with millions of customers, or show billions of hours of content to audiences of billions.

This success has bred success and created a surge of interest in this space: in 2017 the VC industry invested $15.2bn into AI related start-ups according to CBInsights[5] although it is worth noting that 48% went to China and only 38% to the US. Even companies that have previously limited their use of ML are now embracing it — perhaps they are like Icarus too, in that they are only taking the enormous risk of flying with unproven technology because they fear being left behind to die. And the rewards, for those who can deploy the technology effectively, are certainly evident: the recent valuations of some tech companies are a stark example of how successful this technique has been[6].

We would all like to believe that the tools and technologies being developed on the basis of data science will at least benefit our customers, but we may be as mistaken as Icarus, assuming that we can fly high without danger. We have seen how machine learning systems can lead to PR disasters, from Microsoft’s racist chat bot, through to Google’s image recognition algorithm labelling black people as gorillas, and with ML based policing and sentencing apparently reinforcing biases against minorities in the judicial system[7].

Each of these should give us pause, but instead of checking how secure our feathers are we seem to be pushing onwards. Machine learning is taking control over larger parts of our lives, influencing our choice of university admission: whether we get a job; who we get to to fall in love with; whether and at what rates we get a loan; and whether or not we will get released on bail. The Chinese government’s proposed social credit system shows how these separate elements could be linked together into one overarching system of control.[8]

This is all happening despite growing evidence that some of the behaviours that companies are optimising for do not serve people’s real needs. We encourage people to spend time online, but social media has a negative impact on well-being[9]; we develop algorithmic news services and they seem to be delivering increasingly polarised media consumption[10]; and the gig economy, which relies entirely on ML systems to function, is challenging economic inclusion of parts of society and criticised for creating insecure jobs that offer little worker autonomy[11].

One reason for concern should be the growing awareness among politicians, the public, and regulators that machine learning (ML) systems are not serving the interests of consumers and voters, and a willingness to regulate behaviours they consider damaging. The concerns over Cambridge Analytica’s use of Facebook data led directly to enormous pressure for better control over how personal data is used, and saw Mark Zuckerberg appearing in front of the US Senate and the European Commission[12], the release of internal emails by a UK parliamentary committee[13] and the high-profile Netflix documentary The Big Hack[14]. Voices that support break up or regulation of big tech are growing louder and more persistent, to an extent that nobody can ignore.

The data scientists who have developed and deployed these systems have a special responsibility here, as they are serving the interests of companies whose profits increasingly rely on them. And the first step to dealing with these concerns and delivering ML systems that sustain and support people, whether they are citizens, users, or customers, is to acknowledge that the data scientists do have choices, because ‘data science’ is not a scientific discipline like physics, seeking to uncover an underlying reality that can be described by fundamental laws of nature, but a computing discipline which has a range of applications, and where our choices directly shape the outcome.

This is not to deny that ML looks like the answer to many of the questions that businesses and government face today. Just as flight was the only way for Daedalus and Icarus to escape prison in Crete, and therefore worth the risk, machine learning underpinned by data science is the best approach we have yet developed to address many key challenges. Our customers expect more and better personalisation, while companies look to cut costs and enhance performance — all in a febrile atmosphere where every business is expected to have its own ML offer and every startup has to present its ML credentials.

We can see this in almost every area, as the drive to more and better personalisation affects the media, retail, financial services, healthcare and government. As customers, we want it now, and we want it tailored to our consumption histories, preferences and inferred needs.

It is not just consumer or citizen demand that drives adoption. The widespread application of computing technology to every area of the modern world means that we are now looking for ways to optimise larger scale systems and ML offers a range of techniques that come the closest to delivering solutions around resource planning and logistics, healthcare, agriculture and urban planning[15].

With ML playing such a central role in delivering so many of our aspirations for the next stage of development of a data-driven economy, the role of data scientists must be properly considered if we are not simply to be the next generation of technocrats who sustain and by their silence support a world where inequality, bias and unfairness are even further embedded in business practice and political systems.

If we simply present ourselves as neutral managers of data flows into and out of objective algorithms then we will deny the important roles we play and negate our possible positive influence.

The alternative is to challenge the narrative around machine learning and look for ways to develop it in ways that serve the wider public instead of those who pay for it: can we deliver machine learning that sustains and supports a public sphere? Can we be responsible fliers, finding a level that keeps us away from the sun’s heat but far enough above the waves to make it safely to our destination?

Challenging a dominant narrative is never straightforward, because much has been invested in developing and sustaining it, but unless we do so there will be no way to avoid complicity in the dark side of machine learning, the use of the tools we develop to entrench inequality, sustain the exploitation of workers and citizens, and reinforce existing structures of economic and political power. We can ameliorate this risk by being open about our higher level goals, taking accountability for the outcomes of our work, and being transparent about the choices and tradeoffs we make.

As the world around us continues to speed up and becomes more complex; as more people expect more personalised services faster than ever before; as more data and more options become available; machine learning increasingly looks like the only way to deal with the challenges of the modern age.

However our success in addressing these challenges will depend on our ability to create real agency for machine learning, and real agency will only come if we manage to develop systems that support the interests and ambitions both of individuals and of the wider society of which we are all part — and if we can convince everyone that we are doing this.

Calling ourselves ‘data scientists’, hiding behind the white coats of ‘objective science’, and pretending that we develop value-neutral systems whose deployment and consequences are the responsibility of others is neither sustainable nor ethical. We need to accept our continuing responsibility and work to ensure that the organisations we work for appreciate the serious implications of our success.

If machine learning comes to dominate the economy, politics, healthcare, education, transport and every other aspect of our lives, then those of us who design and deploy these systems must be open, accountable and transparent if we are going to be able to live with the consequences of our actions.

It’s therefore important for any organisation or individual working on ML-based services — especially those with broader societal impact like platforms, broadcasters, or government itself — to consider seriously how what they are doing serves the public interest. If the wax is starting to melt, it might be a good idea to fly less close to that all-powerful algorithmic star.

We believe that we can best do this by:

  • Being explicit about what fair means to each of us and about what trade-offs your organisation is willing to accept
  • Taking accountability for the algorithms inside your organisation, and owning the trade-offs they create
  • Being transparent and open around the choices that are made. This also includes being proactive about your narrative and continuously educating your customers and colleagues

If we do this then instead being Icarus, taken along for the ride by our father, we can be more like Daedalus, the inventor who understands the limitations and dangers of his invention just as much as the opportunities it offers. And Daedalus, of course, landed safely.

Now read part two: what you can do about it.

Authors

Gabriel is the Head of Data Science and Architecture at the BBC where his role is to help make the organisation more data informed and to make it easier for product teams to build data and machine learning powered products.

He is an Honorary Senior Research Associate at UCL where his research interests focus on the application of data science on the retail and media industries. He also advises start-ups and VCs on data and machine learning strategies.

He was previously the Data Director at notonthehighstreet.com and Head of Data Science at Tesco. His teams have worked on a diverse range of problems from search engines, recommendation engines, pricing optimisation, to vehicle routing problems and store space optimisation.

Gabriel has an MA (Mathematics) from Cambridge and an MBA from London Business School.

Bill is a well-known technology journalist and advisor to arts and cultural organisations on matters related to digital technologies. He is a Principal Engineer in BBC Research & Development working on ways the BBC can deliver its public service mission online.

Bill has been working in, on and around the Internet since 1984, and was Internet Ambassador for PIPEX, the UK’s first commercial ISP, and Head of New Media at Guardian Newspapers where he built the paper’s first website.

He appears regularly on Digital Planet on BBC World Service radio and writes for a range of publications. Formerly a Visiting Professor at the Royal College of Art, he is an Adjunct Professor at Southampton University and a member of the Web Sciences Institute advisory board

He is a former member of the boards of Writers’ Centre Norwich, Britten Sinfonia, and the Cambridge Film Trust. In 2016 he was awarded an Honorary Doctorate of Arts by Anglia Ruskin University. He manages the website Working for an MP (w4mp.org).

Bill has an MA (Natural Sciences) and the Diploma in Computer Science from Cambridge.

References

[1] See for example http://humanetech.com/problem/, https://www.cnet.com/news/mark-zuckerberg-savaged-by-south-park/, https://www.vox.com/recode/2019/7/31/20748732/josh-hawley-smart-act-social-media-addiction, https://techcrunch.com/2017/07/30/the-attention-economy-created-by-silicon-valley-is-bankrupting-us/, https://www.theguardian.com/technology/2017/oct/05/smartphone-addiction-silicon-valley-dystopia,

https://www.wired.com/story/tristan-harris-tech-is-downgrading-humans-time-to-fight-back/

[2] See https://www.forbes.com/sites/forbesagencycouncil/2017/11/16/social-engineering-through-social-media-from-fake-profiles-to-russian-meddling/#4cf3efad3ef0

[3] See James Bridle story here https://medium.com/@jamesbridle/something-is-wrong-on-the-internet-c39c471271d2 or this analysis of the move to extreme content http://fortune.com/2018/03/11/youtube-extreme-content/ .

[4] See https://www.greekmyths-greekmythology.com/myth-of-daedalus-and-icarus/

[5] See https://www.cbinsights.com/reports/CB-Insights_State-of-Artificial-Intelligence-2018.pdf

[6] For example https://www.statista.com/statistics/263264/top-companies-in-the-world-by-market-value/

[7] Microsoft’s chat bot https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist;Google image search https://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tags-black-people-as-gorillas-showing-limits-of-algorithms/; ML based policinghttps://www.floridatechonline.com/blog/criminal-justice/4-problems-with-predictive-policing/ ; ML sentencing https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)

[8] Hiring: https://hbr.org/2016/12/hiring-algorithms-are-not-neutral), dating http://fortune.com/2017/02/14/eharmony-dating-machine-learning/), creditl http://www.emric.com/how-to-use-machine-learning-in-credit-scoring/), bail https://theconversation.com/new-models-to-predict-recidivism-could-provide-better-way-to-deter-repeat-crime-44165 social control https://www.businessinsider.com/china-social-credit-system-punishments-and-rewards-explained-2018-4

[9] See https://onlinelibrary.wiley.com/doi/full/10.1002/da.22466

[10] See https://press.princeton.edu/titles/10935.html)

[11] See http://journals.sagepub.com/doi/abs/10.1177/1024258916687250)

[12] See https://www.nytimes.com/2018/04/10/us/politics/mark-zuckerberg-testimony.html

[13] See https://mashable.com/article/uk-publishes-facebook-emails/

[14] See https://www.netflix.com/gb/title/80117542

[15] healthcare https://www.techemergence.com/machine-learning-healthcare-applications/
agriculture https://www.sciencedirect.com/science/article/pii/S0168169917308803
urban planning https://hackernoon.com/how-to-combine-some-machine-learning-methods-for-traffic-prediction-18bf4270881d?gi=e21795d99bd4

--

--