Statistics - Analytics
Statistics is commonly misunderstood, as it can be:
- descriptive statistics. As measurement very visiable and in the context trustworthy
- Mathematical statistical theory. As it is theorethical it is not visiable but trustworthy as it must be proved.
- Statistical prove of research assumptions. It is using the statistical theory but has introduced uncertaintity.
- The learning machines approach with models are using all kind of assumptions using some statistical procedures workin to results with a level of uncertainty.
Contents - Introduction
Statistics Analytics have their origin with mathematics.
A long historical backgound exists with:
- philosophic connection
- measuring the human environment
- predicting the unknown
At the glossary references to meaning of words are collected Analytics - mathematics
All references off abuse and (mis)intrepretation:
Data_dredging (data fishing, data snooping) is the inappropriate (sometimes deliberately so) use of data mining to uncover misleading relationships in data. Data-snooping bias is a form of
statistical bias that arises from this misuse of statistics. Any relationships found might appear valid within the test set but they would have no statistical significance in the wider population.
I´m working mostly on other pages at this moment (aug 2012).
Found this was subject to do as dedicated subject. Links and parapgraph´s will be moved to here.
For the most time this page will be a mesh-up. As soon I see the hit-ratio will grow I will do a clean-up.
Correlation - causation.
- Correlation may not imply causation, but it sure can help us insinuate it.
You could call it pride & predjudice.
tools Missing quality
The purpose of this paper is to evaluate the accuracy of MS Excel in terms of statistical procedures and to conclude whether the MS Excel should be used for (statistical) scientific purposes or not. The evaulation is made for Excel versions 97, 2000, XP and 2003.
According to the literature, there are three main problematic areas for Excel if it is used for statistical calculations.
Probably many things have become better at MS.
Wrong anlayses Boeing 757
stating missed aging of airplanes
Most statistics require an evenly or normally distributed pattern of data. “Normally distributed” means that basic hill shape we envision, like the following coin flip graph illustrates.
However, many (most?) types of investment data are not evenly or normally distributed. Applying standard statistics to such information produces unusable results, yet are often cited as meaningful and the basis of some conclusion.
One of the most famous sayings about statistics is the line: “There are three types of lies, lies, damned lies and statistics.” This was stated by author Mark Twain (Samuel Clements) and quoted by British statesman Benjamin Disraeli. There is a book entitled, “How to lie with statistics”.
Big data - to big expectations - noise
Getting trapped by perfect information
Some publications to be science have gone to far with the truth.
The dutch news I have followed:>
Use & Abuse of statistics - (mis)intrepretation
Publishing articles (kpi)
Design and analysis of group-randomized trials in cancer: a review of current practices.
Only 18 (24%) of the 75 articles documented appropriate methods for sample size calculations. Only 34 (45%) limited their reports to analyses judged to be appropriate. Fully 26 (34%) failed to report any analyses that were judged to be appropriate. The most commonly used inappropriate analysis was an analysis at the individual level that ignored the groups altogether. Nine articles (12%) did not provide sufficient information.
comment to: Ben Goldacre - Bad Pharma
Intrepretation of relationship (correlation)
sample body length
Protestantism is concentrated in the north and east and Roman Catholicism is concentrated in the south and west of Europe cuased by the reformation.
The people of Northern Europe are longer than in southern Europe. Human height
There is a strong statistical evicdence of human height and religion.
No one will believe (I hope) changing of religion will influence your body height.
Sample correct interpretation
The intrepretation of Abraham Wald
made him famous. Not the damaged parts were the real problem as most people would think.
The data showed that there were similar patches on each returning B-29 where there was no damage from enemy fire, leading Wald to conclude that these patches were the weak spots that must be reinforced.
This is still considered today seminal work in the then-fledgling discipline of operational research.
Spreadsheet horror stories
Spreadsheet horror stories world
The common approach is to use spreadsheets, Visual Basic and more tools you can do be your own on your personal computer.
No IT-department to be involved with all of their difficulties. Also you´l miss their opportunities, these are more log-term casualties.
No check and double-check of results and way of analyses. You can find a lot of easy mistakes within the news magazines.
For spreadsheets usage I found some reference.
The biggest challenges are: ... how to manage all this information, ... how to get it into business culture.
Still being referred as in:
Spreadsheet horror stories NL
Modelers need to understand math
Working with confidence 95% - 5% intervals
The Confidence interval
is a statistical result related to the
Margin of error
. Working with statistics this only says something of the probability of being wrong in some way.
Having the research set up correctley. All figures worked out correctly, there is still a probability of publising a result that is wrong. Or a result not published because the missing of underpinning figures was wrong.
The ethics with research on humans is not to be ignored wiki/Guidelines_for_human_subject_research
not acceptance an other view
| Persons || Modern times I |
| Nassim_Nicholas_Taleb || (1960) Lebanese American essayist, scholar and statistician, whose work focuses on problems of randomness, probability and uncertainty. His 2007 book The Black Swan was described in a review by the Sunday Times as one of the twelve most influential books since World War II |
© 2012 J.A.Karman (21 apr 2012)