Sunday, September 1, 2013

The Beauty of Bell Curves, with applications to perfume

As of today, September 1, 2013, at 10:34 pm Berlin time, the Parfumo database lists 31,748 perfumes. For the past few months, I have been attempting to pull all of my perfume reviews together in a searchable archive. I'm not quite finished, but as of right now, there are 2,114 reviews. 

The goal of Parfumo appears to be to list all perfumes in existence, whether discontinued or readily available. My goal? Just to travel around the olfactory universe a bit. Do I have any intention of reviewing another 28K perfumes? No, of course not. Maybe I'll call it quits when I reach 3K, but one thing is clear: I have no desire to sniff the vast majority of perfumes in existence. Why? The answer, my fragrant friends, lies in the beautiful bell curve:

Bell curves depict a "normal" or Gaussian distribution of some quality or thing covering a fixed range. They are perfect for evaluations of things which come in all sorts of varieties and specifically when we choose to rate those things using simple numerical scales. I rate perfumes on a scale from 1 to 10, with 1 being rock bottom scrubber, and 10 being incredibly wonderful, even transcendent. In a normal distribution of things along the x axis, the number of items (shown on the y axis) at the lowest end will be matched by the number of items at the highest end. The ratings eventually drift off to nothingness at both ends when continuous and not discrete measures are used.

Bell curves are useful for handling lots of things in real life, believe it or not. Take people, for example, but let's also stick with perfume. Of the people you happen to encounter in your life, how many of them are completely anosmic? Probably not very many. If we were to graph that trait, I imagine that only a tiny number would be completely anosmic, and it might well match the number of people who are hyperosmic to the point of throwing a fit whenever anyone wears a scent in their presence. Most people fall somewhere in between. 

There are lots of statistical nuances between mean and median, and so forth, but for our purposes, to think about what we should expect when we set out to test perfumes, this simple bell curve is good enough.

The numbers didactically displayed on this particular version of the bell curve indicate that 68% of things fall right down the middle of the range, and 96% of things fill the broad underbelly of the curve. Most things, including perfumes, are average. Some are above average; some are below average. The extreme outliers are the top 2% and the bottom 2%. 

Let's think about those numbers for a moment. If you test 100 perfumes, how many, realistically speaking, should be masterpieces? Well, if perfume ratings are plotted along a normal distribution, then you should expect the number of scrubbers to pretty much equal the number of masterpieces. This is not to say that any two perfume wearers will agree on which ones those are. 

Every single trait, every sensitivity to every scent (and ingredient) included in a perfume, and every single taste can also be understood in terms of a bell curve distribution. Consequently, we should not expect to see all that much convergence in opinions about perfumes, it seems to me. What we should expect, however, in every case, is that to any given perfume wearer, most perfumes should be average. This might even be a tautology.

I have often wondered how I could explain my innate disdain for people who throw temper tantrums about perfumes which they happen not to like. Yes, they are childish, of course, but how and why exactly? The answer, my fragrant friends, lies here in the bell curve. We should rationally expect the worst 2% of all perfumes to be just that. It would be crazy to expect more than 2% of all perfumes to be masterworks, would it not?

Not so fast, sherapop. It's all going to turn on the reviewer. If someone actually expects every perfume to be a masterpiece, then he will be disappointed 98% of the time. On the other hand, if a person has no powers of discrimination, then everything will smell great. Will it not? But what, you may by now be wondering, does the actual perfume rating distribution look like for a real live perfumista?

Inspired by Undina, the reigning Queen of Perfume Stats, I set out this afternoon to plot my very own ratings and determine whether or not I evaluate perfumes numerically in accordance with a normal bell curve distribution. Here's what I found:

As you can clearly see, my ratings lean to the right, relative to a normal distribution. Although my numbers of absolute scrubbers and masterpieces are very similar, I am more generous in bestowing ratings on the wearable perfumes in the broad underbelly of the curve. I give an unexpectedly large number of 6's and 7's, which is probably because my approach is to award ratings based on an "all things considered" system. I take into account the cost of perfumes which strike me as an exceptionally good value, and as a result, I'll give a Molinard a 7 which I might have given only a 5 or a 6, had it not cost me nearly nothing.

All of this makes me wonder whether I should try to not do "all things considered" ratings. Until I remember that people may read my reviews, and they may be looking for some useful advice, especially if they already know that we have similar tastes. Keeping those people in mind, I feel that I should continue on with my "all things considered" ratings. So if a perfume costs $800 and smells like an average designer launch, then I may give it only a 4 instead of a 5, even though it is completely wearable. In this way I am basically expressing my opinion that it is not a perfume which I would recommend purchasing. Why? Because I don't see the point in spending $800 for a perfume very similar to a perfume which costs $80.

There is another possible explanation for the right-side heaviness of my curve: I do not seek out perfumes which I am fairly sure a priori will not smell good. I tried a couple of the Coty drugstore scents, and they were so horrible that I simply decided to avoid liquids in that general territory. This means that I am not really sampling a normal distribution of perfumes. I am not randomly spritzing in the dark. I decide to sample some perfumes and not others, and that selection process weeds out more of the perfumes that might have shown up in the 2, 3, and 4 rating region of the curve. 

I also test a lot of niche perfumes. Whether or not I happen to think that they are great, they are often very nice, and because I care very much about high quality ingredients, even a solid niche perfume which does not break any new ground may receive a 7 from me, just because it smells so nice. 

Now I'd like to open up the floor. What does your ratings distribution look like, and why? Can you think of other explanations for the leaning to the right of mine? Of course, I could just be indiscriminate, but that does not explain why the far left and the far right do appear to be normal. Or are they? I need to do a quick calculation. 

The total number of reviews is 2114, so 2% is about 42! This means that my extreme termini are unexpectedly low--at both ends of the graph. 

So it's really true, after all: Most perfumes are average! Or at least I believe that they are...


  1. So can you tell us about those 2% :) I mean Would you please List all those 42! :) Thanks

    1. Hello, Noraniom, and welcome to the salon de parfum!

      Are you interested in my 10's or my 1's? I can provide a link to either one, just let me know!

    2. Thanks Shera Pop :) I'm interested in your Top 10's :)

    3. Here you go. I'll give you my 10's and my 9's since they only add up to 38!



    4. Fascinating! I knew our tastes differed, but there is only one that you've listed that I would absolutely agree belongs at the far right - L'Air du Desert Marocain. And Avignon and Chinatown, OK, I could see 9s for them. Now I want to see your 1s, if you would - I'm sure some of my favorites are going to be there! ;) (I do 1-5 ratings and this makes me want to go digging in my number pile when I get a chance...)

    5. Happy to oblige, pitbull friend!


    6. Now that is fascinating. NO strong disagreements with you there, though I don't dislike Guerlain AA Cherry Blossom.

    7. Maybe your masterpieces will be among my 2's! :-)


  2. I haven't tried - let alone reviewed - as many perfumes as you have but I quickly ran my own statistics. My categories aren't as straight forward as your ratings but I made some mapping (for the rating 5 I use those perfumes that I still test and haven't decided if I like them or not). Those constitute 38%. But then the results are skewed: 19% got rating 1 and 14% got 9/10.

    1. Thank you, Undina, it is an honor to have you, the Queen of Stats, weigh in on this post! ;-)
      I have to say that this was a lot of fun, and I have you to thank for the inspiration! I don't think that I have the graphing savvy to produce your splendid displays, but I now fully appreciate the appeal of this sort of endeavor. Uh-oh...

      As for your ratings: that's an interesting system. I think that the perfumes in my house waiting to be tested probably are going to be, on average, 6's or 7's, for the reasons I give above, and in the light of my previous results. So calling them 5's in my case would be to underestimate their probable value.

      How do your numbers look in the 2 and 9, and the 3 and 8 range? Is your distribution looking normal? It looks as though you are more adamant in your ratings, with 19% (not 2%) of 1's and 14% of 9's, compared to my meager numbers at the extreme ends of the graph...

      I recall that when I was a grad student and serving as a preceptor (at a hoity-toity university), I had a conversation with my adviser about why I gave so few A's on papers. The students were all expecting A's--after all, they got into THAT school!--so they hated getting a B on a paper. But my argument was: if you give everyone A's then it doesn't mean anything anymore.

      Looking at my perfume review stats, it appears that I have not changed my attitude about this in the least!

    2. Oh, I see: your 14% is for 9's. Have you given any 10's, then?

    3. 14% is a combined number for both 9 and 10 where 2% account for my 10s.

      As to the 5s, I wasn't talking about the perfumes that were waiting to be tested - those I don't know. I was referring to perfumes that are still in the "testing" phase. Usually those are the perfumes about which I cannot make up my mind - that's why I gave them the rating 5.

    4. Thanks for the clarifications, Undina!

      That's interesting that you can be undecided after testing. I usually feel prepared to slap a rating on after a single wear, though I do sometimes later change my mind...

  3. Thank you for undertaking this study. Although my testing is limited in scope, I agree that much of what's new is not worth a second wearing. When I first began to learn about fragrance a few years ago, I bought the older perfumes because they were affordable and, I assumed, made of good quality ingredients. I'm so glad I started training my nose in this way--if I'd begun with most contemporary frags, I might well have given up the hunt for olfactory enlightenment.

    1. I concur, Deb. If my journey had commenced at the current Sephora wall, I'd probably be reviewing teas right now instead. ;-)

  4. Your numbers are skewed left (the tiny tail is the reference to the skew), and quite honestly the skew is not unexpected. People tend to rate things highly. On a 10- or 11-point scale, the pattern we'd most likely see is a preponderance of 7s, 8s, & 9s. In fact, I'd say your numbers are probably more balanced than most reviewers given the relatively few 8s and the preponderance of 6s & 7s. You also may have a preference for the number 7, and many people do, but that's another topic.

    There is another way to look at your ratings. Your median or mid-point rating is between 5 & 6. The way I see it, you do have a notion of what is average (it's between 5 & 6). Your perfume sampling is self-selected, as you mention. If you start making your way through some mass-market junk, I bet your median rating would start trending to 5, maybe lower ...

    The bell curve infrequently exists in the real world. It's an assumption we make so that we can apply statistical models (that presume normal data) to real world data (that are anything but normal). Something as simple as a mean (the arithmetic average) is based on a presumption of normal data, not to mention all the techniques based on means (an entire family of linear models, like regression & anova, for instance). Most statistical techniques we use are based on the mean, yet most of our data shouldn't use the mean. Whether or not it matters is an entirely different question.

    nb - You'll probably get a kick out of this. One of my favorite articles on this topic is titled "The Unicorn, The Normal Curve, and Other Improbable Creatures" (Micceri 1989).

    1. Thank you very much for the insights, Awesomeness, and also for correcting my use of the term 'skewed'! I'll change that to the vernacular 'leaning' posthaste!

      Interesting observations. My favorite number is actually 6, so maybe that pulls me toward 7? LOL

      I know that many biological traits are not really distributed normally, because they depend on all sorts of evolutionary factors, but what about preferences? I suppose that it's true that half will be to one side of the mode (is that the right word?), and half will be on the other side. But as far as the actual assignment of ratings is concerned, it's bound to turn on the person. Some people do seem to like almost everything they sniff. Some people are much more picky. I think that I'm pretty typical--again, defining myself as "normal"--ha!

      Isn't the situation with perfume analogous in many ways to food: some people are gluttons; others are epicureans, and some people are just not very interested in food at all.

      I've also been thinking about the "perfume is art" crowd. Maybe they think that every perfume is a work of art and so that inclines them to rate more highly? I tend to think that much of the stuff being pumped out today is truly swill and generated solely on the basis of market data and under restrictive constraints--both financial and creative, whenever "managers" get to weigh in on what finally gets produced. (Which is always, isn't it? Expect in the case of small indies?)

      I'll take a look at your reading suggestion. Thanks so much again!


All relevant comments are welcome at the salon de parfum—whether in agreement or disagreement with the opinions here expressed.

Effective March 14, 2013, comment moderation has been implemented in order to prevent the receipt by subscribers of unwanted, irrelevant remarks.