Can A.I. understand literature?

31 Jan 2019

The nice thing about working at an AI company is that sometimes we get to see cool tech-related stuff that is only tangentially related to my day-to-day. It’s even better when this turns out to be stuff that legitimises my degree in English Literature – not exactly famous for being the most vocational of subjects to study.

This is how I felt the other day when John, the brains behind the technology at Discover AI, posted this piece, catchily titled, Measuring the similarity of books using TF-IDF, Doc2vec and TensorFlow. This is an exploration of what happens when John used “machine learning to map books onto a two-dimensional plane so we can explore a library and potentially find other titles we may be interested in reading”. You should read his article if you’re interested in finding out how he did this – there’s even a link to the code behind it, if you’re interested in trying it out yourself. You should read his article anyway because the magic of a hyperlinked internet means we all have multiple attention spans and can jump between several things at once.

However, you should stay here, (or come back – read in whatever order you like) if you want to see what happens when a nostalgic ex-English student whose job involves qualitatively analysing AI sourced data takes a look at the results.

Can AI understand literature?

To create a picture like this, John had to reduce 128 dimensions down to just 2, in order to render it in a visual format 'humans' can understand. For a Discover AI project, we use all 128 dimensions to measure similarity...

I think this is fascinating. But before we can even begin to look at the results, we have to understand how we got here. In order to do the analysis, John removed all words which were considered ‘generic’ – in fact, those which “contributed little to the essence of the story” in order to make sure the machine could differentiate between the texts. This is a bold statement. What constitutes the ‘essence of a story’? This question isn’t a million miles away from how we define our sourcing areas at the start of a project – just as we ask which websites and brands can get to the heart of a category, so here it is which sentences and sentence structures are crucial to the story?

Of course, this in itself is just one analysis, that of the similarity of story. If we wanted to analyse how similar these novels were by structure, would we deem different words essential? Or if we wanted to understand how they used point of view – would the only thing that would be essential in that case be pronouns?

But in this instance, we’re focusing on story. In this light, the results which have emerged have some obvious causes, and raise some interesting questions. It makes sense that the Dickens is bunched together – but why are Great Expectations and A Tale of Two Cities more alike than A Christmas Carol? It is interesting that despite the ghost based narrative, A Christmas Carol is much closer to Jane Austen than the more fantastic H.G. Wells. Does this suggest it is more a romance than a fantasy? It is also potentially surprising that Jules Verne and H.G. Wells, both the founders of modern science fiction, in their way, are so far apart. Wells was purposefully choosing to move away from the Romantic Science Fiction of Verne, and perhaps this stylistic shift is reflected in their distance.

I could go on, but I shouldn’t. If you’re interested to learn more about ways data have been used for literary criticism in the past, this review of Franco Moretti’s only partially successful attempt is a good starting point.

However I think what this exercise demonstrates is that using AI in this way can be a really challenging exercise in how we define the terms of our questions, and an opportunity to challenge our own thinking. The results it then throws up can push that thinking in new directions – whether that be for analysing literature, or for a particular brand challenge.

If you want to learn more about how AI can help us be more human by challenging our thinking then contact us.

Download whitepaper

Written by Hannah Marcus
With a background in semiotics and a newly-acquired MA in Cultural Studies, Hannah is almost too excited by overlaps in technology and feminism. Her hobbies include ballroom dancing and talking about her period.

More from our blog

A Cringe World

We all know how cringe feels, but what does cringe really mean? What in our online worlds can make...

Stories From the Edge

There has been a well documented impact on financial behaviours due to the pandemic and lockdown...

Dave the Robo-Cleaner

We bought a Dyson robo-cleaner recently and unsurprisingly it’s proved to be a really useful...

Fandoms - Freaks and Geeks

While fandoms and subcultures are by no means new phenomena, social media and online culture have...

Alternative Diets

Over the last few months, we’ve been exploring the world of ‘Alternative Diets’, and why people...

The Future of Immersiveness

The pandemic has turned everything upside down in the past year. During the crisis, people were...

Key trends in sustainability

Over the last 3 years, we’ve worked on over 50 projects for clients, where the central question has...

Buckle up

by Cathy Maisano In 1886, Carl Benz patented the first three-wheeled automobile. A couple of years...

Making the invisible visible

As we begin to emerge from the shielding of locked-down homes, there are inevitably questions of...

How our homes are being physically & emotionally re evaluated under the shadow of COVID-19

What lockdown could mean for our lives at home, during C 19, and beyond… In many ways, our homes...

Essential Reading

Written by Nikki Crumpton, Chief Strategy Officer at BeenThereDoneThat There’s a lot of talk about...

How COVID is transforming our relationships with media (and each other)

As the weeks have stretched into one another, and many countries reach different stages in their...