home-BI     BI analytics     BI Tools Suppliers     statistics analyses     descriptive statistics     data at work     stat (ir)responsible     stat educational
Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Statistics - Analytics


mathematics bi
Incorrect use of Statistics
All references off abuse and (mis)intrepretation of statisics at the dedicates page: Abuse of statistics




working on this I´m working mostly on other pages at this moment (aug 2012).
Found this was subject to do as dedicated subject. Links and parapgraph´s will be moved to here.

For the most time this page will be a mesh-up. As soon I see the hit-ratio will grow I will do a clean-up.



Postioning & relations

math ,


papers
Context word Statistics
Statistics is commonly misunderstood, as it can be:

References

This technology area as challlenging as statistics & analytics.

Information Technology

This technology area as challlenging as statistics & analytics.

papers

Analytics - Mathematics

As based on a part of mathematics this beta thinking approach must be understood.
Not only that, there are meany mandatory regulations on many areas to be taken notice off.

papers

Content awareness

Some areas with strict regulations.




papers
Descriptive Statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are thought to represent.

References Descriptive Statistics
All references to this kind of data collected at a dedicated page:
Descriptive - Data

believe
Probability theory
Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single occurrences or evolve over time in an apparently random fashion.




geniaal

Data Mining

This is more a collection of all kind of goals and technics.



Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

historical references




believe

calcula, measures, units

Calcution is the basic step with Mathematics . Many methods with measures haven been used. It have been made common with the Decimal approach. Numbers like 12 (2,3,4) and 60 (2,3,4,5,6) are better divisible as by 10 (2,5). 10 based calculations have become common accepted. The only excecption is the technical computer approach as binary based, mostly notitions are in hexadecimal.

0_(number) Greek and Romans did note use a decimal system with a placeholder. The decimal system has an other origin.

Having a decimal system, measures and culculations can strongly simplified. This is a Metric_system . Numerical_analysis

Trigonometry has not touched that much by a decimal approach.
Radians (Pi of pythagoras) has more influence.
Angles are still in degrees, 360 to be round up, or 2pi based. Some french approach angles till 400.
The measurement of the earth has become very accurate with GPS.

The time and calendar did not changed to a metric system it has been tried: French_Republican_Calendar
Still using hours of 60 minutes and every minute of 60 seconds. This can be very easy with locations (gis) and positions/time on earth.
greek

Greek fundamentals

Pythagoras
The fundamental of the western world are of the old greek. The most famous is:
Pythagoras of Samos Pythagoras ho Samios "Pythagoras the Samian", b. about 570 – d. about 495 BC was an Ionian Greek philosopher, mathematician, and founder of the religious movement called Pythagoreanism. Most of the information about Pythagoras was written down centuries after he lived, so very little reliable information is known about him.

Aristotle & Plato
These old greek philosophers are stating the problem with the analytics. theory_of_universals Aristotle to Platonic realism

Although modelling data looks to be mathematical proofed there is uncertainty.
The way of doing research on data can even be more art (human intrepretation) than real evidance.



Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Bayes Fisher




industrial revolution

Industrial revolution

Statistics history
Before the middle 1900, statistics meant observed data and descriptive summary figures, such as means, variances, indices, etc., computed from data.

Thomas_Bayes has done the initials for the developping the theory about what is Bayesian_statistics and Bayesian_probability

The other important person is: Ronald_Fisher with Fisher's_exact_test

There are some debates about using a bayesian approach or Fisher.

Statistics references soundings Fisher and Bayes
Essentials of Paleomagnetism: Web Edition 1.0 (March 18, 2009) (magician.ucsd.edu) Paleomagnetists have depended since the 1950’s on the special statistical framework developed by Fisher (1953) for the analysis of unit vector data.

Encyclopedia_of_computational_neuroscience (scholarpedia.org)



Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Statistical Basics




Probability

On wikipedia al lot descriptions can be found.

Mean
believe In statistics, mean has two related meanings:
the arithmetic mean (and is distinguished from the geometric mean or harmonic mean).
the expected value of a random variable, which is also called the population mean.


Median
believe In statistics and probability theory, median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half.

Normal
believe In probability theory, the normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped probability density function, known as the Gaussian function or informally the bell curve

Poisson
believe In probability theory and statistics, the Poisson distribution (pronounced [pwas?~]) is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

Uniform
believe In probability theory and statistics, the discrete uniform distribution is a probability distribution whereby a finite number of equally spaced values are equally likely to be observed; every one of n values has equal probability 1/n.

Skewness
believe In probability theory and statistics, Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable.

Kurtosis
believe In probability theory and statistics, Kurtosis kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable.[1] In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution and, just as for skewness, there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population.

Chi-squared
believe In probability theory and statistics, the chi-squared distribution(also chi-square or ?²-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables

F-test
believe A F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled.




hidden variable
believe A confounding variable (also confounding factor, hidden variable, lurking variable , a confound, or confounder) is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable.

colinear , hidden variable
believe In geometry, Colinearity is a property of a set of points, specifically, the property of lying on a single line. A set of points with this property is said to be collinear (often misspelled as, but should not be confused with, co-linear or colinear).
Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data.

Anova correlation clustering

Mahalanobis distance
believe In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed





Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Operational

Decisions Operations




Operational research
believe Operational research
Operations research, or Operational Research in British usage, is a discipline that deals with the application of advanced analytical methods to help make better decisions[1]. It is often considered to be a sub-field of Mathematics. The terms management science and decision science are sometimes used as more modern-sounding synonyms.

believe
MPS , LP programming , AHP
decisions
OODA loop
believe The OODA loop The (for observe, orient, decide, and act) is a concept originally applied to the combat operations process, often at the strategic level in military operations. It is now also often applied to understand commercial operations and learning processes.

,
Glossary notes miscalenous
Benford distribution of numbers
Six_Sigma
-
credit fico
monte Carlo , Vegas
believe A randomized algorithm is an algorithm which employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits. Formally, the algorithm's performance will be a random variable determined by the random bits; thus either the running time, or the output (or both) are random variables



Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Forecasting




Arima




Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Data Mining




Some dedicated technics are used
SVM: machine_learning (wiki) Explained (tristanfletcher) Shogun_(toolbox)
Neural_network: software , Artificial
Vector_autoregression: (1 wiki)
CHAID
believe CHAID is a type of decision tree technique, based upon adjusted significance testing (Bonferroni testing).



Analytics analyse (inferential)

CRISP-DM
believe The CRISP-DM methodology is described in terms of a hierarchical process model, consisting of sets of tasks described at four levels of abstraction (from general to specific): phase, generic task, specialized task, and process instance

At the top level, the data mining process is organized into a number of phases; each phase consists of several second-level generic tasks. This second level is called generic because it is intended to be general enough to cover all possible data mining situations.


SEMMA
believe SEMMA is an acronym that stands for Sample, Explore, Modify, Model and Assess. It is a list of sequential steps developed by SAS Institute Inc., one of the largest producer of business intelligence software. It guides the implementation of data mining applications [1]. Although SEMMA is often considered as a general data mining methodology, SAS claims that it is "rather a logical organisation of the functional tool set of" one of their product, SAS Enterprise Miner, "for carrying out the core tasks of data mining"


SQL scoring
believe Oracle 10gr These single-row functions support classification, regression, anomaly detection, clustering, and feature extraction. " MS excel is also mentioned

Statistical_classification In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known."

Data_Mining_Extensions (DMX) is a query language for Data Mining Models supported by Microsoft's SQL Server Analysis Services product.
PMML
believe The Predictive Model Markup Language (PMML) is an XML-based markup language developed by the Data Mining Group (DMG) to provide a way for applications to define models related to predictive analytics and data mining and to share those models between PMML-compliant applications.

The Data Mining Group The Data Mining Group (DMG) is an independent, vendor led consortium that develops data mining standards, such as the Predictive Model Markup Language
disappointing are the old years (2010) mentioned.

papers

Predicting the future - PMML

PMML is a standard to help deploy (score) data mining models
Part 1 offered a general overview of predictive analytics. Part 2 focused on predictive modeling techniques, the mathematical algorithms that make up the core of predictive analytics. Part 3 put those techniques to use and described the making of a predictive solution.


believe
PMML sources
top-10-pmml-resources (predictive-analytics.info)


complex
Big data sources
big_data_press_release (whithouse) big-data-rd-initiative (2012/03/29 cccblog) (wallstreet journal) Creating financial models involving human behavior is like forcing 'the ugly stepsister's foot into Cinderella's pretty glass slipper. analytics-india-jobs-study (analyticsindiamag 2012) choosing_a_good (graphs 2006/09) real-time-analytics-basics-bayesian ( predictive-models 2012/07) real-time-analytics-bayesian-part-2 ( predictive-models 2012/08)


Links / subjects
Correlation_does_not_imply_causation
Data_virtualization (big data)
-
Predicting the future - PMML
Treatment of missing data


Ensemble Models




Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Game




Games Theory



complex

Classic theory

Game_theory
Decision making
Probability

complex

Gamification

Gamification Gamification is the use of game thinking and game mechanics in a non-game context in order to engage users and solve problems. Gamification is used in applications and processes to improve user engagement, Return on Investment, data quality, timeliness, and learning

It is going into the Social aspects human relations




Games Simple



complex

Simple games

Choosing and playing random, or not being random.
Three_door_problem (wiki)
Good old Monty Hall! Or, All Probability Is Conditional (wmbriggs)
wheel-of-mythfortune (mythbusters)
Good old Monty Hall! Or, All Probability Is Conditional (wmbriggs)
Rock-paper-scissors




Games Industry



complex

advanced usage IT

The game industry has always been one of the first adanced users of IT resources. Game Studios at the Forefront of Big Data, Cloud (slashdot) For Riot Games, Big Data Is Serious Business (slashdot 2012)





Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Algorithm

,
Glossary notes miscalenous
Collection
Random numbers
-



complex
Defining Algorithm
Algorithm . Algorithm_characterizations

Divide_and_conquer_algorithm .

complex
Fast clustering algorithms for massive datasets
Clustering with text (bigdatganews) .



Page Ranking

complex

complex

Corrections

Likelihood false postive
Bonferroni .

Time shifting lag


Leverage


complex
Collection
Choosing and playing random, or not being random.
Quicksort (wiki)



Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

miscellaneous




Numbers
believe

Random numbers

Generating good random numbers is ever lasting question.
Mersenne_twister   Wichman Hill  

believe

Benford distribution of numbers

With te conditions of real measures the numbers itself are not random.
Benford's_law (wiki)   How a Simple Misconception can Trip up a Fraudster and How a Savvy CFE Can Spot It (acfe)  

archive
Six_Sigma
Product standard allowed defects. In fact 4.5 sigma. Six_Sigma (wiki)  

Control charts , also known as Shewhart charts (after Walter A. Shewhart) or process-behavior charts, in statistical process control are tools used to determine if a manufacturing or business process is in a state of statistical control.

archive
credit fico
Scoring and modeling, whether internally or externally developed, are used extensively in credit card lending. credit_card ch8 (fdic.gov)  



Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom

Historic references


Persons History Of Statistics theory
Persons History Of Operational research
Abraham_Wald (1940) founded the field of statistical sequential analysis operational research



Positioning    History    Bayes Fiser    Statistical Basics    Operational     Forecasting    Data Mining    Game     Algorithm     miscellaneous       top bottom
home-BI     BI analytics     BI Tools Suppliers     statistics analyses     descriptive statistics     data at work     stat (ir)responsible     stat educational

© 2012 J.A.Karman (21 apr 2012)