It’s easy in the realm of self-funding to be overwhelmed with too much information. Where do you turn to get answers to seemingly simple questions? A client might ask what their claims costs are and what’s driving these costs. They might want to know the percentage of claims that are emergency room visits and how frequently they occur.

From claims frequency to utilization, do you know where to turn to get answers? You might think a sophisticated reporting mechanism will provide the information you need only to find that, in most cases, what you’re given is overkill. Sure, reporting tools are helpful. They provide a lot of useful data and even offer predictive modeling, risk profiling, forecasting and benchmarking. However, a proven way to answer questions relative to your client’s claims data is to use what is commonly referred to as “descriptive statistics.”

With healthcare costs continually on the rise, self-insured employers need to be more cost savvy than ever, even if that means coming in a hair below the charted norm. Where will your client be in the next three years? If self-insured employers have no grasp on their healthcare costs, does it even make sense in this environment for an employer to self-manage their health insurance costs? Yes, it does. By learning and applying the value of a simple set of statistic principals, you can help offset your client’s claims cost and dramatically reduce their spending. You don’t need to be a mathematician or depend on sophisticated reporting functions to attain an overall digest of what’s happening within a given employee population. You just need an understanding of the basics. Using what is known as descriptive statistics, brokers can help clients evaluate claims information, identify recurring claims expenses, and put clients on the right track to create and implement cost-effective plan designs and effective cost-containment programs.

Mathematically speaking, descriptive statistics quantify a data sample and summarize the essential elements within that data set. As a broker, this means that just a few formulas can provide a solid summary of a client’s health benefit data, allowing you to survey and perform a fundamental evaluation of their claims cost and utilization. The methods outlined below can provide your clients with a clear window into the complex world of claims data. To get started, here’s what you need to know:

**Distribution**

Distribution is most often used to identify the frequency of a value or range of values for a specific variable. A variable is another way of saying that a category can change. Distribution can answer any number of questions about a client’s employee population, such as what is the relationship of male to female employees? What percentage of employees falls into different age groups? Who are my client’s top 10 providers? An easy way to find the distribution would be to list each value for a variable (for example, list out the price for each claim) and the number of items that had that value (for example add the number of claims that were priced between $1 to $500; $501 to $999; and so on). For example, you can identify the percentage of claims in each paid amount if you list the price of each claim and add the number of claims that had the same price or were between a set price range. As distribution is commonly displayed using graphs, percentages or both, the below model illustrates this example:

The above bar graph identifies the volume of claims for each price range. This data can answer any number of questions, but most importantly, it focuses the client’s attention on the obvious 45% of their claims fall between $1,000 and $5,000. The next step is to ask why. This information presents not only what future claims costs may be, but also helps the client to identify reasons for this lot of claims. Is it an anomaly that a majority of the claims are in this cost range or is there a common diagnosis driving these costs? Is there a wellness or disease management program the client could implement? Are the employees using in-network providers? If not, why? Although the client would need more detail to answer these questions, the above table helps the client to focus their attention on potential issues and to apply the appropriate resources to reduce costs.

**Central Tendency (Mean, Median and Mode)**

When evaluating your client’s data, you might need to identify the most common occurrence of a given variable. Your client might ask, “What is the average claim cost?” “What prescription, provider or network is being utilized the most?” “What is the frequency of office visits?” All of these questions can be answered by finding the central tendency.

The term, “central tendency” helps to get an idea of the middle values in your data. There are three major types of central tendency: mean, median, and mode. It’s important when finding the central tendency to use all three functions to develop a broad picture of the middle values. This will help to identify outliers, a value or a set of values in the data sample that is widely separated from the rest, which may be distorting your analysis.

Finding the central tendency is useful for several reasons: you can observe the central tendency against a benchmark to draw comparisons, identify problem areas to target, and reduce claims expense, or budget for future claims costs.

To find central tendency, you should first find the mean. You can find the mean, or average, by adding your values together and dividing by the total number of values in your data. For example, the mean of claims cost is determined by adding the cost of all the claims and dividing by the number of claims in your sample. But be wary of using just the mean to analyze your data. One outlier could skew the results. Does the mean include one extremely high or low dollar claim? If you’re not scanning the claims for this detail, the mean won’t tell you. It’s in this example in which the median, or middle value, might point out irregularities. If you find that the mean, median, and mode are all about the same, there is not a lot of variance in the data and the central tendency is a fair estimation.

The median determines where the data sits relative to the middle or 50th percentile. You can find the median in the example above by listing the claims in numerical order by cost and finding the midpoint. The median helps to determine whether your mean is near the true central tendency. Is the middle claim in this list close to the average claim cost? If not, and your middle claim is off by more than just a few (+/- 10) dollars, check your data for atypical costs.

Much like distribution, the mode determines frequency in your data set. For example, your mean and median is indicating that the average claim cost is $450. How often does that cost occur? And what is it? You can find the mode by listing the costs in numeric order and adding each specific cost occurrence. The highest number of occurrences indicates the most frequent cost. It turns out the employees are frequenting an emergency room rather than a standard provider’s office, which explains the cost. In this example, you might suggest a higher co-pay on emergency room visits to encourage standard office visits to significantly reduce the frequency of this cost and save your client a lot of money.

**Dispersion**

Dispersion is the spread of values around the middle or central tendency and is commonly measured by either “range” or “standard deviation.” The range is the difference between the highest value and the lowest value in a data set. Using the example above, if the highest claim cost was $100,000 and the lowest claim cost was $150, the range of claims dollars would be $99,850. The importance of knowing the range is that it can show you the most extreme values in either direction, high or low. While this is useful information, it could be affected by outliers. Therefore, understanding the concept of dispersion is important.

Another useful measurement of dispersion is the complicated standard deviation. The standard deviation can be a more accurate measurement of variation around the mean. It identifies whether values are concentrated around a mean or if they are spread out. Graphically, the dispersion can be shown using a scatter plot like the following examples:

In scatter plot A, the data points are centered on the mean or diagonal line, indicating less variance around the mean. In scatter plot B, the coordinates are widely dispersed, meaning the data is skewed with extreme values around the mean. It should be noted that although standard deviation is a formula for calculating variance, the formula itself is complex and rarely done by hand. The easiest method to calculate standard deviation would be to highlight the data sample in Microsoft Excel and to apply the standard deviation function (STDEV) to it.

The information is only a summary and limited in detail. So reporting tools may be used in concert with your efforts to help drill down and evaluate date. However, analysis obtained through reporting mechanisms, alone, can be too cumbersome to analyze if you don’t first have a jumping off point. Having descriptive statistics is an important first step in cost containment efforts. Thus, reporting tools should not be the first place you turn to find answers. Blending reporting tools with descriptive statistics can guide your self-funded employers to cost-savings by allowing you to provide appropriate, cost-effective benefit plan recommendations.

Of course, these statistics don’t need to be computed by hand; using simple software solutions on your computer such as Microsoft Excel can be an excellent resource for sorting or calculating sets of data. Or you can always reach out to your TPA to help with this kind of analysis.

For more information, please contact us.