Most questions that savvy marketers share are more complex than these, and over at the Email Measurement and Accuracy group, Luke Glasner and John Caldwell have assembled a team to foster universal metrics and develop protocol to be congruent across all ESPs and enterprises.
Applicable measurement benchmarks that are universally accepted within the email industry are a challenge, however it is my belief that the future of quality email measurement is quite possibly rooted in what I call a "Quality Email Score" for each type of mailing. Let me explain...
Consider this:
Once an identity or definition of each email measurement is universally accepted and recognized, a specific measure of an email campaign may be based on a 4 or 5 digit integer. This integer would be rendered after (x) number of hours (72, for example), for a decent sized sampling. This Quality Email Score (QES) is plausible if we include a grouping of pertinent and well recognized email measurements as variables for each mailing.
Such measurements may include, but are not limited to an Open Rate (Render rate), CTR, Domain Reputation, Inbox Delivery Rate, Spam Complaints, Frequency (list fatigue), Relevance, Engagement, List Quality, Content, Conversions, ROI, Subscriber Influence, AOV, etc., or any subset of these measurements. The proposal involves innovative leaders from the email and analytics industries facilitating, and then determining, a cumulative email algorithm for each deployment based on which variables are being tested. In other words, marketers would assume one score for each email deployment. We can also break this down into a Quality Score for each particular measurement, then mix and match.
Each of these measurements will be weighted differently within the algorithm; this process may also include factoring in different types of industries.
Once the campaign is deployed, the Quality Email Score can be compiled and generated. Since every email campaign would be treated differently, each type of email stream then may warrant its own algorithm. This could mean that each type of email stream would have a different integer ranking. For example, confirmation emails might have a simple two digit score since fewer measurements will be calculated, while transactional streams with social media integration will display a three digit score. B2B campaigns may be computed differently than B2C; once again, this difference would be based on the measurement groupings within each campaign. However, both could still utilize the 4-digit integer.
Once calculated by your ESP, a detailed summary of each score or SWOT (strengths, weaknesses, opportunities and threats) analysis will be produced and issued to each marketer with commentary, so that improvements can be made on subsequent deployments. Remember, The Quality Email Score may be broken down further by assessing a score for each measurement. Another example could include a Quality Score for each segmented list, which could be used to determine a “what if scenario.” MailChimp offers a general idea of this with its “list activity score.” The major benefit here would be a simplified method of testing before, during, and after each deployment, as well as an easy way to predict return and subscriber engagement.
These various scores will then be a part of an ongoing analysis of each marketer in that it would allow ESPs to further assess the history of a potential client. For example, since not all ESPs measure deliverability the same way, this method invokes congruency across all ESPs. In this way, it is similar to a credit score. The goal, of course, is to give each marketer enough feedback in the SWOT evaluation so that future scheduled mailings will be more relevant and produce better overall engagement, eventually earning a higher score. The initial goal is to create a core group of measurements that will be universally accepted by each ESP.
This idea is certainly an uphill climb, and is merely an endeavor to encourage thoughts on what “standardized” email measurement can grow to be, whether it’s as simple as four core measurements or as complex as 50. I understand that there are several milestones to accomplish before this idea can even seriously be considered, but I thought I’d put it out there for now to encourage a possible future vision.




As if there was ever any question why Fred is part of the eec Measurement Accuracy Roundtable, here's the answer!
Posted by: John Caldwell | October 19, 2009 at 04:15 PM
FTR, when we launched the list activity score in March:
http://www.mailchimp.com/blog/your-list-activity-score-and-deliverability/
the idea was mainly to automate abuse detection, by discriminating based on how clients behaved, not how they *promised* to behave.
It's turned into an extremely valuable decision making tool for our abuse desk, especially for consulting with clients experiencing delivery issues. The numbers don't lie.
It's got its pros and cons. A couple months ago, we passed the idea around to some very well-known deliverability and anti-spam experts in the industry. They offered some tips on how to tweak our algorithm that we're currently considering.
Generally speaking, it's been great for the abuse desk, and for the most part, very good for overall deliverability. Very responsible senders that we otherwise never would've known (because they're small, and never would have asked us for a halo IP address) are rising to the top and getting outstanding delivery rates, and are very happy.
Bad senders are getting caught faster than we would've caught them "manually" or "reactively" via spamcop/haus, etc. and we're shutting them down before they do too much more (repeated) damage.
Not-so-great senders are the ones who need to clean up their act, very fast.
We're about to make "list activity" aka "engagement" transparent all the way down to the per-subscriber level, and tie it to segmentation. We'll see what that does.
Posted by: Ben Chestnut | October 20, 2009 at 10:20 AM
Nice article, Fred. I am already using something similar to that complementing results with sprinkles of simple descriptive statistics.
What you are proposing is similar in essence to what the search engines do and would involve massive amount of work on the ESP side. Let me explain.
To get started, I believe subjective and objective (or whatever you want to call them) measurements should be separated. Objective are the ones that depend more on the infrastructure, sender rating, overall deliverability and rendering - they are the ones marketers can do something about without the recipients' involvement. Subjective ones are influenced by the recipients themselves and can include relevancy (in segmentation), list engagement, content, creative, etc.
I can argue that we already have a measurement like this - it's called ROI (let's leave service messages aside for now). Every marketer uses it but everyone has different contributing metrics to it.
I think the ESPs can contribute to the 'objective' mesurements (something MailChimp is already doing) and provide a flexible platform for the marketers to assign and mix and match the scores THEY are interested in, instead of using 'one size fits all' approach, because it doesn't.
Posted by: Alec Saiko | October 20, 2009 at 11:05 AM
Hi,
Very interesting article. You will be interested to find out that we have implemented for a year something like that. We use this as the basis on how to deliver emails. We also use our internal score to accept or reject a delivery automatically.
This is basically how we guarantee deliverability result to our customers.
So yes, it works but the math and the tech behind it is much more complicated than what you describe and difficult to replicate.
Posted by: Nicolas Toper | October 21, 2009 at 03:04 PM
Hi Nicolas,
Thanks so much for your comment. Yes, this is an uphill climb and complicated, especially with subjective and objective measurements as Alec mentioned. Successful examples like yours only help to galvanize thought leaders within the industry to generate a blueprint of objective measurements that can be standardized. So, when marketers switch from one ESP to the other, a cumulative score, along with their domain reputation will be part of the evaluation process for ESPs.
Posted by: Fred Tabsharani | October 21, 2009 at 03:36 PM