I have made no secret of my enthusiasm about the benefits of big data. I like that my computer recommends books to me. I’ve rarely, if ever, been disappointed with the suggestions I’ve received—to the contrary I usually marvel at how well these algorithms know me and the books I like to read.

When music made the transition to digital and then to streaming, it was hard to tell whether I was more excited by the vast catalogue of music I now had at my fingertips or the smarts that had been built into these streaming services that allowed them to play me endless selections of non-repeating music, perfectly curated to match my tastes. I find I no longer turn to friends for music recommendations. Instead, I fire up an algorithm that picks, from out of the largest selection of music ever assembled in the history of the planet, new artists I’ve never heard of and music I’d never otherwise have experienced. More often than not, I find that the music these services select matches my tastes and preferences well enough for me to cede control of my playlist to them.

Streaming music services need to engage us, capturing our attention so that we remain on their platform. The longer we listen to the music that they serve us, the more their algorithms will get to know what we like and play us music better aligned with our tastes. E-book libraries are only able to generate revenue if they can entice us to buy a new book immediately after we have finished reading the last one. They have a better chance of doing that successfully if they can figure out exactly what will appeal to us.

For digital distribution to work, content must be paired with preferences. When customers get what they want before they even know they want it, they will begin to rely on the service to the exclusion of all other options. Digital content providers spend an inordinate amount of effort in categorising content tagging by style, genre, mood and as many other parameters as possible. They layer on top of all this relevant social information what people are saying about this content and what others in the same demographic are consuming. Algorithms process all this data, cross-referencing it against the vast stores of digital content they are authorised to license, presenting their customers with content that has been perfectly curated to suit their preferences. Thanks to these digital recommendation engines, I’ve been introduced to artists and books that I might otherwise never have heard of. Seldom have I been presented with a book that I dislike or an artist whose music leaves me cold.

But lately, as I look through the books in my digital library I’ve begun to notice a certain sameness to the books on my shelf. They all deal with the same themes, content I am interested in but which hardly represents the eclecticism that I’d like to think characterises my reading habits. I realized that I’ve allowed myself to be led by algorithms, consuming everything they offered me without seeking suggestions outside of the box. In time, my original preferences were reinforced with every subsequent recommendation I accepted till in the end there was scant variation in the content I was being presented.

I’ve since made a concerted effort to climb out of the hole I’d dug myself into. I now go to bookstores—not to buy physical books but to search the shelves for titles that catch my eye—to discover writing unlike anything I’ve ever read before but which I might like. I try to frequent music blogs to find out what these curators are listening to, looking to discover for myself something novel and interesting. I end up liking only a small fraction of the artists that I come to hear of using this hit-and-miss method but every now and then I come across a true gem—music so unlike anything I’ve heard before that the algorithms would never have thought of recommending it to me.

There are many good reasons to use algorithms. It is humanly impossible to manually process the full range of data that we now have at our disposal and we could do with all the help we can get. But we have to be mindful of the limitations that algorithms operate under. They are only as good as the data they’ve been trained on and since they are designed to be self-learning, the data sets based upon which they make their decisions are constantly evolving. Unless we regularly prune these data sets keeping them fresh and relevant to the purpose for which the algorithms are intended to be used, the results will be far from helpful.

Rahul Matthan is a partner at Trilegal. Ex Machina is a column on technology, law and everything in between. His Twitter handle is @matthan

Close