From collecting the perfect sample size to ensuring the integrity of your statistical interpretation, understanding all the components of statistics can be difficult to navigate. In a world where knowing and using statistics is becoming ever more relevant in policies, social movements and more, it can be helpful to learn or recap some of the basics. Here’s everything you need to know about descriptive statistics!
Being able to accurately interpret data is an essential statistics skill
Using bit data analysis and statistical methods to understand the world around us may seem like a 20th century invention. While the statistical inference that statisticians conduct in the present day are much more powerful with statistical software and programs, the work of the statistician is one of the oldest trades on the planet.
While the intricacies of Bayesian statistics or understanding categorical data deserve a lengthier explanation in the context of the evolution of statistical data and analysis, you can get the basics by looking at the origin of statistics in a nutshell.
While statistical inference intersects a plurality of other modern disciplines, such as biostatistics or business analytics, it started as a way to order or register the phenomena of early humans. Recording and analysing the movements or agriculture, astrology or commerce to improve sanitary, food and economic conditions were all ways our ancestors used inferential and descriptive statistics.
Attempting to define a domain like mathematical statistics, whose uses are ubiquitous, can be like trying to find a needle in the dark – especially if that needle happens to be as headache inducing as probability theory. Here to alleviate the confusion behind all things probability and statistics is someone who initially detested anything having to do with statistical analysis.
Admittedly, the jargon attached to statistical theory can get scary: ordinal and categorical data, sample data, population mean, percentile, Markov chain.
Behind the complex terminology, however, are concepts that are actually quite simple at their base. If you’re studying statistics, you will likely be taught Bayesian statistics, which uses a probability distribution in order to test a null hypothesis against an alternative hypothesis. In layperson’s terms, Bayesian statistics makes assumptions about raw data to construct hypotheses and then tests whether those hypotheses are likely or not for that given set of data. Before diving further into prediction analysis and common statistical techniques, it can be helpful to start with the common ways in which you can use data visualization to analyse qualitative and quantitative data.
Anyone who’s had to produce a histogram, pie or bar chart for class or work – congratulations, you’ve participated in one of the most common ways data scientists conduct statistical analyses. Descriptive statistics are measures of central tendency and variance, which translates into measuring data by the average and how far away particular points are from that average. Measures of central tendency can include metrics like:
Measures of variability, or dispersion, include things like:
While this may sound like an overly simplified process, conducting exploratory analysis with descriptive statistics is an integral part of every study design. Before the mathematician or data scientists concerns themselves with multivariate linear regression or constructing a confidence interval with estimators, they have to know what their data contains.
You can complete a statistical data analysis with just descriptive statistics and their visualizations. One of the most beautiful examples of this dates back to the 1850s, when Florence Nightingale produced her infamous “coxcomb” pie chart in order to extract vital information on mortality during the Crimean War. At a time when women in the field were virtually non-existent, Nightingale paved the way for innovative, under-represented groups in statistics.
Seemingly similar statistics can have wildly different impacts on policy
Another important aspect of generating descriptive statistics is that many statistics or regression models necessitate certain assumptions in order for the them to be valid. While these assumptions vary from model to model, the most common requirement is that the data be normally distributed. A normal distribution is a probability curve that follows the central limit theorem. The majority of data normally doesn’t follow a normal distribution, which is why you’ll see many statisticians transform their dependent variables or independent variable.
Using software like SPSS, R or excel – anyone can easily extract these metrics of central tendency and dispersion from the data. If the data is normally distributed, these metrics become extremely powerful. In finance, for example, the distribution of the data and the percentile under which certain prices or stocks fall under are used in order to understand the advantages or risks of potential trade deals.
As you’ve probably noticed by now, descriptive statistics is very distinct from the other main branch of statistics: inferential statistics. While inferential statistics uses the data to try to make predictions about the populations using statistical models, descriptive statistics merely describes what is actually in the data.
Using descriptive statistics to analyse categorical and numerical, observational data is a type of statistical methodology that people utilize when they want to, for example:
This type of analysis, as opposed to regression analysis or ANOVA is typically called univariate analysis because it tends to analyse only one variable at a time.
While aspects of statistics like chi-square analysis, confidence intervals, or the correlation coefficient can be very enlightening – sometimes, all you need are descriptive statistics. Take the following numbers into account:
Let’s say these numbers pertain to a set of data on your class’ test scores. You want to understand how the class performed but aren’t sure how to set up your experimental design. Assuming the data follow a normal distribution, we know that 68% of scores are within one standard deviation of the mean, 95% are within two standard deviations and 99% are within three.
Without using statistical significance, randomization, or the least squares method, you are able to figure out that 95% of the class scored between 22 and 38 points where:
Another statistical giant in the realm of descriptive analysis is correlation, which is a number that describes the relationship between two variables. While you probably already know this, make sure that you fully understand the difference between correlation and causality. Correlation is a mathematical tool to understand how changes in one variable relates to changes in another, while causality is the notion that the changes in one variable causes changes in another.
If you’ve ever heard a mathematician joke, then you’ll definitely appreciate the common example statisticians give when elucidating the correlation-does-not-equal-causality point. Take, for example, hand size and age. If you were to plot a sample of people from various ages, you would likely see a relationship where the bigger the hand size, the bigger the age as well. While there is clearly a correlation, or relationship, between hand size and age, there is unlikely to be any causality. If there were, this would mean that, were your hands to get smaller or disappear, you would either regress in age or die, respectively.
Moving onto the other main branch of statistics, inferential statistics is what people normally think of when they call up images of statistics. Relying on probability theory to create statistical models to draw inferences or calculate an estimator on one dependent variable or more, inferential statistics can be hard to define. However, the most important characteristics of inferential statistics can be boiled down to one sentence: the branch uses sample data on a population to make predictions outside that data set.
Distributions and probability can be taught at a young age
From binomial distributions and outliers to finding the perfect parametric test, understanding or remembering all of statistic’s components is an impossible feat. However, there are plenty of ways to either learn or improve your statistic skills.
Use the many websites online dedicated towards explaining statistical concepts like a random variable or analysis of variance. Here are some of the best tutorial or troubleshooting guides online:
If you prefer one-on-one help, make sure to check out Superprof’s community of over 140,000 maths teachers in the UK. From probability to regression, you can try a lesson for the average price of 10 pounds an hour!