Incorrect use of Statistics
All references off abuse and (mis)intrepretation of statisics at the dedicates page:
Abuse of statistics
I´m working mostly on other pages at this moment (aug 2012).
Found this was subject to do as dedicated subject. Links and parapgraph´s will be moved to here.
For the most time this page will be a mesh-up. As soon I see the hit-ratio will grow I will do a clean-up.
Context word Statistics
Statistics is commonly misunderstood, as it can be:
- descriptive statistics. As measurement very visiable and in the context trustworthy
- Mathematical statistical theory. As it is theorethical it is not visiable but trustworthy as it must be proved.
- Statistical prove of research assumptions. It is using the statistical theory but has introduced uncertaintity.
- The learning machines approach with models are using all kind of assumptions using some statistical procedures workin to results with a level of uncertainty.
References
This technology area as challlenging as statistics & analytics.
Information Technology
This technology area as challlenging as statistics & analytics.
Analytics - Mathematics
As based on a part of mathematics this beta thinking approach must be understood.
Not only that, there are meany mandatory regulations on many areas to be taken notice off.
Content awareness
Some areas with strict regulations.
Descriptive Statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are thought to represent.
References Descriptive Statistics
All references to this kind of data collected at a dedicated page:
Descriptive - Data
Data Mining
This is more a collection of all kind of goals and technics.
calcula, measures, units
Calcution is the basic step with
Mathematics .
Many methods with measures haven been used. It have been made common with the
Decimal approach.
Numbers like 12 (2,3,4) and 60 (2,3,4,5,6) are better divisible as by 10 (2,5). 10 based calculations have become common accepted.
The only excecption is the technical computer approach as binary based, mostly notitions are in hexadecimal.
0_(number) Greek and Romans did note use a decimal system with a placeholder. The decimal system has an other origin.
Having a decimal system, measures and culculations can strongly simplified. This is a
Metric_system .
Numerical_analysis
Trigonometry
has not touched that much by a decimal approach.
Radians (Pi of pythagoras) has more influence.
Angles are still in degrees, 360 to be round up, or 2pi based. Some french approach angles till 400.
The measurement of the earth has become very accurate with GPS.
The time and calendar did not changed to a metric system it has been tried:
French_Republican_Calendar
Still using hours of 60 minutes and every minute of 60 seconds. This can be very easy with locations (gis) and positions/time on earth.
Greek fundamentals
Pythagoras
The fundamental of the western world are of the old greek. The most famous is:
Pythagoras of Samos
Pythagoras ho Samios "Pythagoras the Samian", b. about 570 – d. about 495 BC was an Ionian Greek philosopher, mathematician, and founder of the religious movement called Pythagoreanism.
Most of the information about Pythagoras was written down centuries after he lived, so very little reliable information is known about him.
Aristotle & Plato
These old greek philosophers are stating the problem with the analytics.
theory_of_universals Aristotle to
Platonic realism
Although modelling data looks to be mathematical proofed there is uncertainty.
The way of doing research on data can even be more art (human intrepretation) than real evidance.
Industrial revolution
Statistics history
Before the middle 1900, statistics meant observed data and descriptive summary figures, such as means, variances, indices, etc., computed from data.
Thomas_Bayes has done the initials for the developping the theory about what is
Bayesian_statistics and
Bayesian_probability
The other important person is:
Ronald_Fisher with
Fisher's_exact_test
There are some debates about using a bayesian approach or Fisher.
Operational research
Operational research
Operations research, or Operational Research in British usage, is a discipline that deals with the application of advanced analytical methods to help make better decisions[1]. It is often considered to be a sub-field of Mathematics. The terms management science and decision science are sometimes used as more modern-sounding synonyms.
SQL scoring
Oracle 10gr These single-row functions support classification, regression, anomaly detection, clustering, and feature extraction. "
MS excel is also mentioned
Statistical_classification In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known."
Data_Mining_Extensions (DMX) is a query language for Data Mining Models supported by Microsoft's SQL Server Analysis Services product.
PMML
The Predictive Model Markup Language (PMML) is an XML-based markup language developed by the Data Mining Group (DMG) to provide a way for applications to
define models related to predictive analytics and data mining and to share those models between PMML-compliant applications.
The Data Mining Group The Data Mining Group (DMG) is an independent, vendor led consortium that develops data mining standards, such as the Predictive Model Markup Language
disappointing are the old years (2010) mentioned.
Predicting the future - PMML
PMML is a standard to help deploy (score) data mining models
Part 1 offered a general overview of predictive analytics. Part 2 focused on predictive modeling techniques, the mathematical algorithms that make up the core of predictive analytics. Part 3 put those techniques to use and described the making of a predictive solution.
PMML sources
top-10-pmml-resources (predictive-analytics.info)
Big data sources
big_data_press_release (whithouse)
big-data-rd-initiative (2012/03/29 cccblog)
(wallstreet journal)
Creating financial models involving human behavior is like forcing 'the ugly stepsister's foot into Cinderella's pretty glass slipper.
analytics-india-jobs-study (analyticsindiamag 2012)
choosing_a_good (graphs 2006/09)
real-time-analytics-basics-bayesian ( predictive-models 2012/07)
real-time-analytics-bayesian-part-2 ( predictive-models 2012/08)
Gamification
Gamification Gamification is the use of game thinking and game mechanics in a non-game context in order to engage users and solve problems. Gamification is used in applications and processes to improve user engagement, Return on Investment, data quality, timeliness, and learning
It is going into the Social aspects human relations
advanced usage IT
The game industry has always been one of the first adanced users of IT resources.
Game Studios at the Forefront of Big Data, Cloud (slashdot)
For Riot Games, Big Data Is Serious Business (slashdot 2012)
Collection
Choosing and playing random, or not being random.
Random numbers
Generating good random numbers is ever lasting question.
Mersenne_twister
Wichman Hill
Benford distribution of numbers
With te conditions of real measures the numbers itself are not random.
Benford's_law (wiki)
How a Simple Misconception can Trip up a Fraudster and How a Savvy CFE Can Spot It (acfe)
Six_Sigma
Product standard allowed defects. In fact 4.5 sigma.
Six_Sigma (wiki)
Control charts , also known as Shewhart charts (after Walter A. Shewhart) or process-behavior charts, in statistical process control are tools used to determine if a manufacturing or business process is in a state of statistical control.
credit fico
Scoring and modeling, whether internally or externally developed, are used extensively in credit card lending.
credit_card ch8 (fdic.gov)