Statistics - Analytics

why bi

Statistics is commonly misunderstood, as it can be:

Contents - Introduction

Statistics Analytics have their origin with mathematics.

A long historical backgound exists with:
  1. philosophic connection
  2. measuring the human environment
  3. predicting the unknown

At the glossary references to meaning of words are collected Analytics - mathematics

All references off abuse and (mis)intrepretation: Data_dredging (data fishing, data snooping) is the inappropriate (sometimes deliberately so) use of data mining to uncover misleading relationships in data. Data-snooping bias is a form of statistical bias that arises from this misuse of statistics. Any relationships found might appear valid within the test set but they would have no statistical significance in the wider population.

Statistics Abuse

data fantays

Correlation - causation.

cognitive biases
believe You could call it pride & predjudice.

tools Missing quality
believe Numerics_in_Excel   The purpose of this paper is to evaluate the accuracy of MS Excel in terms of statistical procedures and to conclude whether the MS Excel should be used for (statistical) scientific purposes or not. The evaulation is made for Excel versions 97, 2000, XP and 2003. According to the literature, there are three main problematic areas for Excel if it is used for statistical calculations.
Probably many things have become better at MS.

Wrong anlayses Boeing 757

stating missed aging of airplanes Most statistics require an evenly or normally distributed pattern of data. “Normally distributed” means that basic hill shape we envision, like the following coin flip graph illustrates.
However, many (most?) types of investment data are not evenly or normally distributed. Applying standard statistics to such information produces unusable results, yet are often cited as meaningful and the basis of some conclusion.


Statistical proof



One of the most famous sayings about statistics is the line: “There are three types of lies, lies, damned lies and statistics.” This was stated by author Mark Twain (Samuel Clements) and quoted by British statesman Benjamin Disraeli. There is a book entitled, “How to lie with statistics”.

Statistical mistakes

Big data - to big expectations - noise

Statistical failed laquila earthquake

italian-scientists-get-6-years-for-laquila-earthquake-statements (cbsnews)

The L´Aquila earthquake – Could have known better? (theusrus martin march 2013)

Getting trapped by perfect information

Statistical fraude

Some publications to be science have gone to far with the truth.
The dutch news I have followed:>

Use & Abuse of statistics - (mis)intrepretation

Publishing articles (kpi)
DM Murray Design and analysis of group-randomized trials in cancer: a review of current practices.
Only 18 (24%) of the 75 articles documented appropriate methods for sample size calculations. Only 34 (45%) limited their reports to analyses judged to be appropriate. Fully 26 (34%) failed to report any analyses that were judged to be appropriate. The most commonly used inappropriate analysis was an analysis at the individual level that ignored the groups altogether. Nine articles (12%) did not provide sufficient information.

comment to: Ben Goldacre - Bad Pharma (Idea Pharma).


Intrepretation of relationship (correlation)

sample body length
Protestantism is concentrated in the north and east and Roman Catholicism is concentrated in the south and west of Europe cuased by the reformation.
The people of Northern Europe are longer than in southern Europe. Human height
There is a strong statistical evicdence of human height and religion.
No one will believe (I hope) changing of religion will influence your body height.

Sample correct interpretation
The intrepretation of Abraham Wald made him famous. Not the damaged parts were the real problem as most people would think.
The data showed that there were similar patches on each returning B-29 where there was no damage from enemy fire, leading Wald to conclude that these patches were the weak spots that must be reinforced. This is still considered today seminal work in the then-fledgling discipline of operational research.

Spreadsheet horror stories

Spreadsheet horror stories world

The common approach is to use spreadsheets, Visual Basic and more tools you can do be your own on your personal computer. No IT-department to be involved with all of their difficulties. Also you´l miss their opportunities, these are more log-term casualties. No check and double-check of results and way of analyses. You can find a lot of easy mistakes within the news magazines. For spreadsheets usage I found some reference. and, The biggest challenges are: ... how to manage all this information, ... how to get it into business culture.

Still being referred as in:

Spreadsheet horror stories NL

Modelers need to understand math

do-predictive-modelers-need-know-math (deanabbott)mar 2013
Working with confidence 95% - 5% intervals
The Confidence interval is a statistical result related to the Margin of error . Working with statistics this only says something of the probability of being wrong in some way.

Having the research set up correctley. All figures worked out correctly, there is still a probability of publising a result that is wrong. Or a result not published because the missing of underpinning figures was wrong.

The ethics with research on humans is not to be ignored wiki/Guidelines_for_human_subject_research

not acceptance an other view

tax prediction

justice cases

Persons Modern times I
Nassim_Nicholas_Taleb (1960) Lebanese American essayist, scholar and statistician,[1] whose work focuses on problems of randomness, probability and uncertainty.[3] His 2007 book The Black Swan was described in a review by the Sunday Times as one of the twelve most influential books since World War II

