The epidemiological equation is a basic tool for understanding the viral nature of messages, and its j and k statistics, if properly reported, are powerful, revealing social media metrics. Here's an example.
The epidemiological equation is Rt+1=kRt(1-Rt)-jRt
- t=observation period. For example, day, month, or hour; if you take your first observation at 9:00 a.m. and hourly thereafter, at 9:00 a.m. t=1, at 10:00 a.m. t=2, at 9:00 p.m. t=13.
- R=rate of participation. What fraction of the relevant people are repeating the message, infected with the disease, humming the tune, wearing the shoes?
- j=die-off. What fraction of participants at period t will stop participating by period t+1? It has meaningful values between 0 percent and 100 percent.
- k=virality (think "konversion" or "kontagiousness"). How fast do participants convert (or infect) nonparticipants? For math reasons we won't go into here, k can be anywhere between –4 and +4.
Let's see what we can learn by estimating the values of j and k in a real-world situation.
Currently a hot area in fiction publishing, traditional and indie, is urban fantasy: the publisher marketing category for books about vampires, werewolves, and other magical beings in the modern industrial world. How effectively will the hashtag #urbanfantasy promote an urban fantasy novel on Twitter?
Manually, I captured who was tweeting the hashtag #urbanfantasy, how often, across an easily accessed period of nine days. Here's what one afternoon's quick analysis showed about #urbanfantasy's potential as a marketing hashtag:
On arbitrarily-chosen Day 0, #urbanfantasy did not appear on Twitter. On Day 1, three people tweeted it; during eight subsequent days a densely-connected community of 107 people tweeted #urbanfantasy with these frequencies:
Those percentages are R1 through R9. On a smoothed graph:
Since we have Rt for t=1 to 9, we can estimate k and j by three different methods:
- Lagged-variable regression.
- Brute-force with a sum of squared errors (SSE) test, from the individual behavior of the function (the transition from R1 to R2 is treated separately from the transition from R5 to R6, or any other adjacent transition, and they're all scalars).
- Brute-force with an SSE test, from the trajectory of the function (the pathway from R1 to R9 is treated as one transition, a vector).
I applied all three. Regression gave a meaningless negative value for j, with enormous error bars. Inspecting the residuals revealed that the failure of lagged-variable regression actually was good news, because the cause of this misbehavior was that Rt and Rt2 are not fully independent and are of nearly equal importance. Interdependence is intrinsic to the math, but coequality of significance confirms that the structure of the epidemiological model is appropriate to the problem.
So, as often happens in these problems, regression simultaneously tells us we're on the right track and refuses to give an answer. The individual brute force behavior estimates were k=2.4, j=0.52, with an SSE of about 0.27; the trajectory brute force behavior estimates were k=3.2, j=1, with an SSE of about 0.24. In a small case like this, chances are true that j and k lie between the individual and trajectory estimates, and because the exact value of j and k
is less important than the behavioral range they fall into, those numbers tell us:
- The social process runs on an internal dynamic. It is not being driven by outside factors like a celebrity, holiday, or news event. An insignificant difference in SSE between the trajectory and individual approaches indicates this.
- #urbanfantasy has lousy persistence. Persistence=100%-j, so it's somewhere between 0 percent and 48 percent -- an F at any school.
- #urbanfantasy is so viral it's chaotic. It's retweeted in sudden up-and-down bursts, sometimes absent completely, sometimes sweeping the community. Scoring virality as (k+4/8) to put it on the same "grade book" basis as persistence, k=2.4 corresponds to 80%, k=3.2 to 90%: a solid B in virality. It may not last long, but it's catchy.
These three conclusions suggested some further non-epidemiological analysis, which revealed two more points of interest:
- Overcoming #urbanfantasy's persistence problem would be cheap. A log R:log F regression showed that the #urbanfantasy community fits a Zipf distribution, with the No. 1 user tweeting the hashtag 56 times, and 65 users only tweeting it once. The top nine (out of 107) participants accounted for exactly half of all tweets. If we target those urban fantasy evangelists with advance reading copies (ARC) with "urban fantasy" prominent on the cover, we can probably start an #urbanfantasy twitterdemic about a book.
- Twitter provides tremendous potential leverage for #urbanfantasy. Plugging the trajectory j and k estimates into the equation, we can calculate that if the nine most active point-of-contact social users tweeted about a book and hashtagged it #urbanfantasy, 37,000 followers would receive at least one retweet or response in one week. That's about 60 percent penetration of the 62,734 unique followers of the 107 active tweeters (unique means people who followed more than one were counted only once).
Of course, this was all an afternoon's amusement. For real-world clients, I'd examine a longer period and test additional hashtags like #sexyvampires, #shapeshifters, and #urbanwerewolves, seeking a hashtag that was at least as contagious (k=2.4 or better) and much more persistent (j=0.3 or lower) -- ideally a "straight A" hashtag (i.e. j<0.1 and k>3.2) reaching a bigger community. But even if we didn't... Hey, nine ARCs with the right two words reach 60 percent of a sizable, passionate reader community in one week.
The verdict is, #urbanfantasy might not be the very best hashtag for helping your vampire-slayer novel go viral, but it will certainly do till you find something better.