Mayhem in the Library: Very Short Introduction to Machine Learning

In this post we explain how machine learning can be used to write copy that mimics a human copywriter. We also delve into how our automatic copywriter writes accurate text that correctly describes an item of clothing.

What Is Machine Learning Anyway?

Traditionally, software makes decisions based on a set of rules coded by a developer. In machine learning (ML), the software makes decisions based on data. Our models rely on being shown examples of text written by professional copywriters from which the models learn how to write descriptions.

Because the AI copywriter chooses words and phrases based on data, we can evaluate its performance based on how closely its output matches the original human-written data. Much like modern retail, ML is KPI-driven. ML models are ranked on how well they perform at mimicking the original text.

Mayhem in the Library

This kind of software may seem magical, but we can describe a simple ML model using a library metaphor. Imagine that you borrow all the books in a library. Then, to the horror of the librarian, you take each page and cut it to pieces sentence by sentence. At the end of this exercise, there will be a large pile of cut-out sentences. We can ask our model to write something by randomly picking a sentence from this pile. In ML, the process of cutting up sentences is called training and the act of picking something at random is called sampling.

This analogy demonstrates important properties of an ML algorithm. If sentences appear in multiple books, such as quotes from famous writers, then these sentences have a higher probability of being sampled. Also, only sentences that exist in the books can be sampled. If we trained only on romantic novels, it is extremely unlikely that our sample sentence would read like it was from a book about car mechanics.

Next, we take all the cut-out sentences and cut them into words and punctuation. Our original pile of paper strips turned into a (much larger) pile of words. We can write a sentence by sampling words until we sample a piece of paper with sentence-ending punctuation. Some words, such as articles (“the”, “a”, “an”), will be sampled much more frequently than nouns and verbs. The generated sentences will have the same distribution of words as the sentences in the books. However, the sampling is random, so the output will almost certainly be gibberish.

A Way Out

We have two models; a sentence-level and a word-level model. The sentence-level model will write fluent sentences, but is constrained to only writing what has appeared in the training set. The word-level model can write any possible sentence but will most likely write garbage. We can build a model that sits between these two extremes. Let’s use an example to explain how this new model works:

One day I will find the right words, and they will be simple.

When we cut up this sentence, we find that the word “will” appears twice. Our word-level model cannot discriminate between these two uses of the word, even though they appear in different contexts.

Let’s create another pile of words where we write on a piece of paper each word and its preceding word separated by a bar character “|”. The two occurrences of “will” are recorded on paper like so:

The procedure for writing a sentence involves sampling words as before, but with a small twist. Imagine that we previously sampled the word “they” and we want to sample a second piece of paper. But now we introduce the rule that if the word after the bar character does not match our previously sampled word, in this case “they”, then the piece of paper goes back to the pile. In the case of the above example only the second occurrence of “will” is accepted by sampling.

We can use longer word histories. Extending our example with an additional word, we get two new pieces of paper for “will” during training:

Sampling will now discard words unless the history matches the two previously sampled words. This model will now produce more fluent sentences, at the expense of ability to write sentences that do not appear in the training set. Longer word histories will make the model act more like the sentence-level system. Word history length is an example of a parameter and we can see that varying parameters changes the behaviour of the model.

Accurate Text

Let’s extend our simple model to write description of dresses. Instead of a library of books, we start from a product catalogue. The catalogue has pictures of dresses with a description. Each dress in the catalogue is tagged with salient attributes, such as neckline (“v-neck”, “round next”, etc). As we train the model, we add the tags to the word context alongside the word history.

Writing the description of a new dress involves first tagging the dress. We then sample words from the catalogue. If the tags do not agree then return the word to the pile. The process is repeated until a word with matching tags is found.

We can now evaluate the quality of the model. A small number of dresses are excluded from the training data and we use the AI to write descriptions of these dresses. If the AI copy is performing well, it will reproduce many of the original words and phrases in the original human-written descriptions.’s Automatic Copywriter

Modern ML techniques use a type of model called a neural network. One very recent example of a strong neural network model is OpenAI’s gpt2. Neural networks use millions of parameters to represent words. Training a neural network involves varying these parameters by a tiny amount at a time and comparing the model’s output to the training set. This parametrisation gives the model more ability to synthesise new sentences at the risk of writing inaccurate descriptions.’s research has focused on ensuring that the neural network produces accurate descriptions. We supply the AI copywriter with examples of common mistakes to avoid. For example, if the dress has a v-neck then the copywriter should not talk about round necklines. We can then punish the model during training if it makes mistakes. This research ensures that we have an accurate model that also mimics the brand-aware tone of voice of the human copywriters.

Share on facebook
Share on twitter
Share on linkedin