Editor’s note: This article is part of a multi-part series co-written by DEG Creative Special Projects Manager Maril Hazlett.
Welcome to round two of Shrey and Maril’s series on the beautiful bond between analytics and creative. This time we’re diving into how a fuzzy matching analysis can help you identify brand content trends that might be causing your audience to disengage on social media.
Maril Hazlett (MH): This story starts with a common business problem clients wrestle with every day on social media: How to produce the maximum amount of engaging content in a cost-efficient manner.
We all know that the more engaging your content, the more engaged your audiences, and the better return you receive on conversions that advance your business goals.
Shrey Bhatnagar (SB): However, in a regulated industry, this problem can get complicated and requires additional time for compliance approval.
In one case, we saw a client struggling to maintain a minimum number of posts per month. The solution this client was pursuing was repeating successful content with minor language tweaks and the same image. This meant a lot of repeated posts.
MH: I do see the logic, but also the fallacy, too. One thing to remember is that social media content is not just about each individual post. Maintaining audience engagement means providing the right cumulative content mix over time. Hitting people over the head with the same post over and over is unlikely to keep them engaged.
SB: And as it happened, this client did see audience engagement drop. So, we decided to test our hypothesis: That re-posting content multiple times was a contributing factor to decreased engagement. This brings us to fuzzy matching.
MH: What a great band name!
SB: Ha, yeah. Fuzzy matching is a methodology that helps us process word-based queries, where the text strings are not always an exact match. It’s a form of computer translation that helps us find related phrases or words in a dataset. (The program I used for this analysis was Alteryx.)
MH: So, when the client posted multiple versions of the same post copy, this resulted in a range of overlapping words and phrases. That repetition became included as part of our overall data set of post copy.
And the multiple versions of multiple posts created a data set containing a LOT of potentially overlapping repetitive words and phrases. (To be fair, the industry meant a preponderance of overlapping nouns would be present.)
So, this data set had a ton of noise. But you needed to find some very specific matching signals.
SB: In the absence of an exact match for a sentence or phrase, fuzzy matching attempts to find a match, which is sort of similar to the original phrase and filters the results so that we only see matches above a user-defined threshold percentage.
MH: Yay math! I sense it coming. Because you set that matching percentage in the tool, right?
SB: Correct! I configured the tool to match strings using the Double Metaphone algorithm and went for a 75 percent threshold matching.
MH: And Double Metaphone is the equation built into Alteryx that actually makes fuzzy matching happen.
SB: One of them. Double Metaphone seemed to make the most sense here because it uses a lot of information about variations and inconsistencies in English spelling and pronunciation. It also creates two encoded versions of the string in question, for better accuracy.
MH: At this point you had decided on the methodology, the tool, and the specific algorithm within the tool. Now explain more about the data set.
SB: We had about three years of organic post-level engagement from Facebook. The dimensions were post date, post ID, and post message. The metrics were total engagement, impressions, and engagement rate.
MH: Okay, non-data folk (such as my creative colleagues reading this), let’s pause for a teachable moment: The difference between dimensions and metrics.
Dimensions describe what the data is (i.e., a post date). Metrics count the data (i.e., 1,000 post dates). Dimensions = qualitative (usually words). Metrics = quantitative (numbers). You can combine multiple dimensions, too.
SB: And dimensions are not JUST words, they can be numbers, too. But the major difference is that you can’t perform mathematical operations on dimensions.
SB: Here’s a workflow of the fuzzy matching analysis. You can tell from the workflow that I took a few measures to avoid erroneous matches. This means I excluded some words, characters, and symbols from the algorithm so that it did not use these entities to generate keys for matching.
- All Unicode characters (symbols, currency codes, and emoticons, etc.)
- Single-letter words, such as directions or abbreviations, like N S E W, Blvd, etc.
- Link URL strings (bit.ly, ow.ly)
Next, I excluded holiday posts, which all tend to be very similar but are exceptional in nature to the main content focus. I also excluded all matches where the publish time between each post was greater than four months.
Last, I normalized the data by only reviewing posts with greater than 100 impressions, using engagement rate as the decision variable. This weeded out scenarios where the second post might have lower engagement due to fewer impressions.
MH: When you ran the fuzzy matching tool on the data set, what was the result?
SB: Just as we surmised. In our dataset, the post-engagement rate dropped by an average of 0.89 percentage points or approximately 70 percent on the second posting of strongly similar content. Check out this graph:
MH: Ouch! Pretty clear. Second posting, lower engagement rate. Ba-boom.
Now I know what comes next. After running an analysis, it’s time to poke holes in it, right? Or at least acknowledge some of the limitations and biases.
SB: You said it. There were a few cases where the second post performed better than the first. That made me reconsider the time interval. So, I ran another report, this time breaking down the 4-month period into weeks. As you can see from the chart below, the longer the interval between re-posts, the less of a drop off in engagement rate. But repetitions within two months of the original post were not good.
MH: Thanks to fuzzy matching, you felt safe in concluding that when the client re-posted identical or very similar content within a short time frame, it decreased the audience engagement. That’s the what.
When I took a look at the data, too, it gave me a better idea of the why from a qualitative point of view. Social posts are so short and tight in structure, it doesn’t take much to break them. Successful post copy is usually a balance between not only brand voice and tone, but also word length, overall interior white space within the post, use of interrogatives, variation in text string/sentence length, post character count, persuasive calls to action, strong teasers, etc.
The non-exact match repeat versions didn’t tweak much, but just enough to lose the flow. They bloated on one or two counts, chose an awkward synonym, switched from interrogative to statement, lost an imperative structure, that kind of thing. These changes probably contributed to formerly successful content now underperforming.
We also saw that in the outliers, where the second post did get more engagement, it also used more internet-savvy language and references, like emojis, internet acronyms (ICYMI, DYK.), etc. Even for a regulated industry! Where brands usually tend to steer clear of any remotely contemporary internet languages.
SB: End ruling on frequent content repetition: Not a great way to add variety and value to your social feed. Ultimately not cost-effective, as it will lose you engagement. And once you lose your audience, it can be very hard to win them back.
MH: Except for you, dear reader. We have no intention of losing you. Until next time!