home-SAS    SAS-SAAS    First steps    Installation    Hardening    Operational    Using    My Notes
Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

Support Using notes


Support Using notes In using the environment, information and education is needed, this is one of the tasks to do as support SAS.
When found a problem at the SAS suppliers side, this external support has to be take care off. (SAS-track)

For configuration options, performance and all other technical stuff, contact your SAS admin. "Platform Admin"
I have made some notes in the Operational chapter.

Getting the experiences in real live, not everything can be documented. In some cases a Blogging note may help.


Federation - Strings - Floating

etl simple etl reality

Business Information

The way to get to the necessary business data is told to be
simple (left/first), in reality in be frustrating complex (right/second).

Federation

To keep the approach as simple as possible the copying of business data should be minimized to basic requirements. It is very common to work with copied data. It is easier to get a copy of the DBMS then getting the access rights organized.

Leaving the data as much at is origin is called Federation. Also something of "virtual data warehousing" I have seen. Data virtualization is the latest name.
Wikipedia - Data virtualization     Data Flux (now SAS)



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

Encoding - Strings

confused

NLS National Language Support

Encoding is the difference in the used characters in all languages. This is still evolving. In the old days just Ebcdic (IBM) and Ascii (not IBM) existed old time logo
CEDA (Cross Environment Data Access) is an interconnection standard of SAS. Originally positioned with SAS/Connect (v82). Properties are:
Unicode unicode.net
Unicode (UTF8-16-32) is the standard to represent characters for every language.
SAS documentation
see: sas-authors-tip-encoding-of-external-files (2012/09/14) SAS reads and writes external files using the current session encoding. This means that the system assumes the external file is in the same encoding as the session encoding. or example, if your session encoding is UTF-8 and you are creating a new SAS data set by reading an external file, SAS assumes that the file’s encoding is also UTF-8. If it is not, the data could be written to the new SAS data set incorrectly unless you specify an appropriate ENCODING option. Here is an example:

unicodeserver (sugi28) Multilingual Computing with the 9.1 SAS Unicode Server.

order2

Encoding different approaches

environment dates time, point of zero
SybaseIQ (sap) Set Sybase IQ InfoPrimer to Support UTF-8 Encoding .
lrcon (Sybase IQ InfoPrimer 15.3)
 



order2

Sort Sequence

Working with mainframes and others machines like Windows and Unix have different results in the resulting order. Ebcdic (wiki) and Ascii (wiki) have different orderings of logical meaning. Ebcdic is starting with low case, upper case numbers Ascii is doing numbers, upper case, numbers.


With SAS it is possible to define an alternate collating sequence. Proc Sort alternate collating. The requirement is with an external sort used, the translation tables must be reversible. If not, the original data can get corrupted. This is meaning: a character translation will take place twice when using this option.



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Miscellaneous     Machine special    top bottom

Measure/category

confused category There is something from old times about: old time logo Number – Character. Probably inherited by the Hollerith approach. A card looking only numbered positions exist.
Avoid this misunderstanding use the correct way.

Account numbers, invoice numbers, number codes of country etc. shouldn´t be used in calculation efforts. They can be used in classifications.
For the confusion: International_Bank_Account_Number (IBAN) account numbers are containing characters!

When used as classification it is better to define it as character (string).
You have to deal in this case with the encoding. Just the basic character set is simple.

trying do something as calculating the mean of invoice numbers, or the sum of land-codes makes no sense.
category When working with numbers SAS is using floating-point. Also Date and Time are of type numbers but these type are indicated separately.

Oracle and other systems are making it more difficult to understand. Some computer notations like zoned and Packed decimal exist. These notations are not binary based but decimal. These notations don´t exist in SAS and must be converted.

Oracle and other systems also have the option of indicating a string (text) just containing the digits 0-9. In the modern age we would call this a constraint. This must also be converted with SAS.

With SAS there are options in aspect to conversions while accessing a DBMS, see the SQL paragraph (dbtype dbsastype).



Format style - dressing

complex

Format explained

nls-africa Working with SAS formats is in the beginning strange. The whole concept is missing in many other tools and languages. Some parts of it can be found. Samples:
What does a SAS format set to SAS variable?
It changes the way it will be presented to you without changing the internal value.

nls-china Think about it like changing traditonal costume to the same person. The presentation will be complete different.
It was the solution to national differences like: decimal point / comma, date-time notations , some unit conversions (meter/mile) , rounding up.

nls-dutch What can you do with formats? Much more, see: It is optimizing processing time and coding logic. Do not change the data, no need for SQL processing, just apply a format.

nls-greece All was set up with a single format in mind. What to do when one variable should be classified in multiple ways? You can think of duplicating the variable or processing twice, evolvment of format is multi labelsformats (MLF).



complex

MLF

See:

archive

doc format

All formats (leforinforref).

Informats can behave different depending on the way of usage, having many ways of possible input coding, language dependicies. Sometimes surprising, mostly the convenat way in the specific area of use. Using Informats (leforinforref).




geniaal

ODS

Output Delivery Systems (ODS) works closely with the presentation.
This part uses Styles, conforming the html standard. But also RTF, PDF, and other presetnation formats are possible. See:

Within Pharma (CDISC) a lot ot doa about documenting. RTF and contents to build reports in a required way. See:
ODS is becoming a hot focus in delivering data, tagsets excel etc.
See:



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

Floating

oldtime

background

It is the same background (remember old days) of slide-rule.
rekenliniaal Some properties:
Wikipedia information Floating point   slipstick  
Related to this:

Many people have problems to understand the problem with precision. This is a complex as it has become distinct mathematical areas.
mandelbrot set   butterfly effect In chaos theory, the butterfly effect is the sensitive dependence on initial conditions, where a small change at one place in a nonlinear system can result in large differences to a later state
confused

Precision

SAS has documented the issues with "numeric precision" :
Other sources SAS:
Other references and techniques:
Other languages:
Some bypasses in digital approach financial values (can be combined):

handy

Arbitrary Precision

When speed is no issue but the precison the approach can be different. Arbitrary-precision_arithmetic

Related to this:
handy

SAS Integers

Bypassing SAS floating, using dbms Perl java number arithmethic. 021-2008

The prime numbers, intgers, used with encryption and seeds have special mathematics Ring (wiki).

The macro language function %eval is integer defined. The documented limitation in digits Defining Arithmetic and Logical Expressions (SAS macro reference) is the one of floating point.

handy

SAS DS2 Data Types

With DS2 an other language in SAS is introduced. A being target on common DB datatypes it has those What are the datatypes? (DS2 Language Reference)

With a GPU usage and the normal floating with all other calculations the previous mentioned limitations are still in place. It will be more difficult to have them correctly applied in according situations.



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

Dates representation conversions storage


clock Time only can be measured as it continuous passing by. There are many standards of it.
A point of zero must be agreed to get an absolute number. There are many points of zero in use.



Conversions of date & time are a very common used. They are mostly going well. With problems in interfacing or coding your self there is lot to investigate




clock

Time different notation methods

environment dates time, point of zero
SAS date & time Time is measured as seconds in whole numbers. Dates in SAS is a number in day with offset of 1jan1960. Date-time combined is a number in seconds with offset of 1jan1960
lrcon language concepts -constants.
 
MS Excel The serial number 1 represents 1/1/1900 12:00:00 a.m. Times are stored as decimal numbers between .0 and .99999, where .0 is 00:00:00 and .99999 is 23:59:59. The date integers and time decimal fractions can be combined to create numbers that have a decimal and an integer portion
kb/214094 microsoft

open office v1.2 oasis open org

Excel specifications and limits microsoft
 
MS SQL server MS SQL has his own internal format to store dates. Date and time data from January 1, 1753 through December 31, 9999, to an accuracy of one three-hundredth of a second (equivalent to 3.33 milliseconds or 0.00333 seconds). Values are rounded to increments of .000, .003, or .007 seconds, as shown in the table.
msdn datetime and smalldatetime
note 13715 comparing dates with SQL-server
 
Oracle DB Oracle uses its own internal format to store dates. Date data is stored in fixed-length fields of seven bytes each, corresponding to century, year, month, day, hour, minute, and second.
Database Globalization Support Guide
SQL Language Reference - 11g Database
 
Java java as char representation.
java dates SAS jdbc drivers cookbook
 
Teradata teradata makes uses its own internal storage. Referers to Unicode Ansi as used standards.
2005 v2.6 documenation (item 043140011) 1143
SQL reference, internal representation tiemstamp
environment dates time, point of zero
Unix - Posix Epoche and year 2038 problem.
Number of seconds elapsed since midnight Coordinated Universal Time (UTC) of Thursday, January 1, 1970
Unix_time wiki
 
SMF Mainframe - SAS The information at IBM is hardly to find. As SAS origin it is still available. Easier to find at SAS Date-time informats.
SAS 9.3 data-time formats rmfstamp smfstamp
 
gps the GPS date is expressed as a week number and a seconds-into-week number. The week number is transmitted as a ten-bit field in the C/A and P(Y) navigation messages, and so it becomes zero again every 1,024 weeks (19.6 years).
GPS time based wikipedia
 
30/360 financial wikipedia Interest calculation often changed day counts to a simplified approach. Using months of 30 days an a year of 360 days. Act/360 act/265 are other methods.
 
timezones are different arround the world
 
gregorian Calendars like this or julian any many more (to be find by wikipedia link).
 
Conversions
geniaal Many date conversations are done by the SAS delivered interfaces. Always some conversions will fail.

Documented in access PC-files as File-format_specfic reference



Standards ISO NEN



Standards other





SAS working with Dates & Time

Calculations
believe Knowing time is in seconds and dates ar in days calculations are easy. Remember the floating propertie of the numbers. Alle effects of that property applies to dates and times.
60 seconds ia 1 minute. 1 hour is 60 minutes. 1 day is 24 hours. 1 month depending on the mont 28-31 day. 1 year 365 or 366 days. (European)



Functions
believe date time functions are well documented (lefucntionsref). Some are very old and still exist. Others are new added into SAS 93.

name functions - date time
intnx Increments a date, time, or datetime value by a given time interval, and returns a date, time, or datetime value.
 
intck Returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values
 
datdif Returns the number of days between two dates after computing the difference between the dates according to specified day count conventions
 
CALL IS8601_CONVERT Converts an ISO 8601 interval to datetime and duration values, or converts datetime and duration values to an ISO 8601 interval
 
A lot more. ready to use with a lot of options. See the samples in the SAS manuals to understand the many variations.
Formats Dates & Time
believe SAS Date time formats (leforinforref) There are the common known standards used in the VS.
As there are many languages with own national guidelines. This is solves by SAS in NLS versions like: National Date time formats

With all thes e formats and informats conversions should be covered.

--- Tip:
Suppose in some way with alle these functions and formats your conversion is not there. The way of text notation should be made a little different.
Use of PUT and INPUT fucntion can do a conversion put input

Macro values for selections
believe Selections in a dedicated data usage will have many the samen selections like: These selections are dependent of the current date (or the simulated version). Noet available as constant values. Selctions and intervals are requiring a cosntant values.
--- Tip:
Use a predefined set of macrovars to be used at the site (global)



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Miscellaneous     Machine special    top bottom

SQL Structured Query Language

SQL joins The major difference of SQl SAS with all other DBMS related SQL is that SAS is capable to mix all DBMS-like datasources with his SQL.

Normally SQL is just related to one supplier of a DBMS.
SQL & SAS datastep
SQL Generic
Positioning SQL & datastep
Choosing datastep or SQL
Advantages/disadvantages datastep-SQL



SQL & datastep

believe
SQL Generic
Information : SAS uses a own SQL implementation following ANSI SQL.
Differences ANSI Proc SQL (SAS)

believe
Positioning SQL & datastep
SAS PROC SQL is able to act on all engine type with relational tables using all SAS functions and all SQL functions
The syntax and approach is more standarized as is used with relational DBMS-sen. Many suppliers are implementing a SQL version.

The SAS data-step is more a programming language.

A new apporach is NO-SQL Leaving the DBMS storage approach. It is used with text mining.

believe
Choosing SAS datastep or SQL
believe
Advantages/disadvantages datastep-SQL
SAS Data-step can act on:
More special properties:
SQL is bound to act on:
More special properties:



believe
SQL dictionary sysobjects metadata
Every DBMS has an dictionary describing the tables an the structure. It is metadata. Other names as sysobject, IDD (Integrated Data Dictionary) are also used.

The SAS 93 sqlproc dictionary is a reference. Most of them are also available as sashelp views. Not only tables but a lot more as: indexes options macros infomaps functions formats.
Processing with the dictionary as data-source:
..
Proc sql ;
select * from dictionary.tables where libname="WORK"
..

This approach is having a disadvantage.
All tables are accessed first and just the references with work are needed and selected as last step. This is a performance penalty.

believe
SQL – Char / floating - Dates
Using the SAS/ACESS interface some translation will occur. The ones that is mostly causing problems are incorrect indication of number/character. With SAS numbers are floating to calculate with. With databases the number fiels can be used for txct just contaiinign numbers. For example banking acccounts, insurance numbers. On these information calculation makes no sense. The string must be kept as identification.


With dates/times a lot different standards exist. Translations must be done, in a smart way by the programmer. note 30473 PROC SQL's CONSTDATETIME toggle option and as 93 system options note 30390 relational database optimization via the SQL procedure.

There are options to change the default translation of variables between the database and SAS. DBTYPE= (create)   DBSASTYPE= (read)



believe
SQL – Implicit <-> Explicit (Pass-through)
Implicit pass-through is chosen by SAS as optimization
Explicit pass-through is coding the SQL statements by yourself to 1 DB-engine
Differences are in coding complexity versus performance. Choose appropriate.
Check & Debug: options sastrace “,,,d” sastraceloc=saslog;

SAS can execute SQL on different type of Databases. SAS is able to handle processing on variables in SAS language. This option can ahve the efeect of copying the original databases to your local session first, not having it executed by the dbms.

believe
SQL – Missings
Perhaps a design failure with SQL. There is only one type of missing, the value "NULL".
More types of missing are possible like: "don&accutet want to answer", "is not applicable", "doesn&accutet exist, not applicable", "technical failure or timed out". With more types in analysis will be relevant there must be some knowledge and intrepretation done.



believe
SQL – Reserved Words
A big pitfall are the reserved words. Sometimes a word is allowed in one DBMS and in the other not.

The list of reserved words is not equal on all systems. With Excel almost every name can be used as name of a column. Others have long lists of reserved words.
SAS has internal naming conventions that can be surpassed with the identification name-string.

Don´t use these word as fieldnames or tablenames. When copying a table it will fail.
A common trap when using pass-through code. Double quotes generally are used differently in relational databases than they are used in SAS. As a result, single and double quotes are not logically interchangeble as they are in SAS.

believe
SQL – Optimize
SQL can gain or suffer a lot of indexes on tables ordering information. It is a job to study on his own. That is why DBA Database Base Administrator has become a segregated role.


With PROC SQL a lot can be done with performance. Not always eays to understand the how to´s
When you are running out of resources there is a problem to get this aligned. Or go back and redesign table structures (index/sorts) . join oreders and where processing.




believe
SQL – Links basics


SAS documentation: PROC SQL & SAS/ACCESS entries is part of standard SAS-Base SQL views (lrcon.pdf) in base.   SQL Proc users guide (sqlproc.pdf)  

SQL generic – Links blogs
Joins eplained: sql joins codinghorror 2007/10

believe
SQL – Links tricks
SAS tricks to coding and optimization:



believe
SQL – Oracle Links tricks
Oracle tricks to SQL:





save light
Reading, Combining, and Modifying SAS Data Sets
One of the first things to learn. To be checked when uncertain.
Basics SAS datasets combining, reading modifying (lrcon)
believe
Data Step – dedicated options - Length
Solving problems with length of variables:
A way is an option VARLENCHK


Converting length explicit by a length statement. Keep in mind the length conversion can be the result of incorrect data sources.
data foo1;
length pid $30;
set foo;
run;
believe
Data Step – Binary search
Knowing your data well can lead to surprising solutions. 031-2007 Quick Record Lookup without an Index



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

SQL joins

-Olap, MDX, Cubes, Starschema dwh

Olap mddb

SAS used a long time as name MDDB, Multi Dimensional Data Base, instead of the more used word OLAP. Looking for OLAP it is giving Multi Dimensional and more. Wiki: OLAP_cube slice and dice with cubes, Online analytical gives more historical reference.

A Star_schema is logical concept with a fact-table and dimensions. It is not exactly the samen as OLAP it can be used together with it. With an cube (or MDDB) storage of the aggregation tables must be planned. In the old days a proc means or summary was suffcient to deliver the reports. With Olap is it the goal of intractively change views. The disadvantage is the requirmetn of having those aggregation tables.



SAS documentation
link description
tnote_olap SAS papers
cubebuilding SAS white paper cubebuilding
olapug pdf 93 olap users guide
note 44765 When to use the short form of PROC OLAP code versus the long form of PROC OLAP code - olap cube studio
note 45001 File sizes of SAS® OLAP cubes might be significantly larger than Microsoft® SQL Server® Analysis Services cubes generated from the same data
note 45001 Using metadata groups to apply member-level permissions to a SAS® OLAP cube
 

SQL MDX

complex Proc SQL can do more dan just sql, using mdx.

Connection to OLAP.



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

graph

Pictures Images more convincing

modern

Dashboarding

This is the most easy to understand presentation of results. It is build within BI server tools.
SAS(R) BI Dashboard 4.31: User´s Guide -Indicator Types by Category (bidbrdug) Data Visualization - SAS/GRAPH Dashboard Samples (index visualisation)


complex

JMP - Visual BI

JMP (Visual BI) is Desktop oriented (Windows Mac). SAS® Visual BI - Dynamic and interactive business visualization SAS® Visual BI, powered by JMP® software, provides dynamic business visualization, enabling business users to interactively explore ideas and information, investigate patterns and discover previously hidden facts through visual queries.
Becoming a part of other licenses like SAS® Enterprise MinerTM JMP® Pro Now Included JMP Pro is included with SAS Enterprise Miner 6.2 (SAS 9.2 release) and SAS Enterprise Miner 7.1 (SAS 9.3 release). It runs only on 32-bit or 64-bit versions of Windows XP Professional, Windows Server 2003, Windows Server 2008, Windows Vista (except Vista Home Basic edition) and Windows 7 (except Starter and Home Basic editions).

confused

htlm gif jpg png

Genearting graphics (gif jpg png) to be used used by html (web) is the function of SAS/Graph
Examples: Generating Animated GIF Images (graphref)

With the index visualsation:
Sample 26104: Create a slider chart indicator for a dashboard One of the samples using code (not generated)

Changing the approach (licensing), this type of graphics are becoming normal fucntionality.
A Primer on ODS Statistical Graphics (statug) Statistical Graphics Using ODS (statug)

A lot is possible even with the coding way SAS. The Gif approach is back as it is reliable working on all kind of decices. Flash for animations is getting problematic.
create-animations-like-hans-rosling (Rob Allisson may 2013) robslink (Rob Allissoon may 2013)

complex

Insight

Changing the approach (licensing), this type of graphics are becoming normal fucntionality.
SAS/INSIGHT Software SAS/INSIGHT software, a component of the SAS System, is a dynamic tool for exploring and analyzing your data. With it you can examine univariate distributions, visualize multivariate data, and fit models using regression, analysis of variance, and the generalized linear model.

Is part of the list with products like SAS/IML SAS/LAB SAS/OR SAS/QC. It is requiering SAS/Base on Windows.
The dynamic behavior of images is being to be replaced by JMP (miner desktop based) and more goint to Visual Analytics (server based) -seen notes as off 2013.

With the Enterprise Miner it is a component with the personal approach. SAS® Enterprise Miner 12.1 - System requirements "SAS Enterprise Miner Personal Client" But it is not mentioned at SAS® Enterprise Miner Dekstop 12.1 - System requirements "SAS Enterprise Miner Personal Client"
With Anti-Money Laundering Enterprise miner is coming, "SAS Anti-Money Laundering Discovery Client on Windows" is having SAS/insight. SAS® Anti-Money Laundering 5.1 - Sytem requirements "SAS Enterprise Miner Personal Client"

complex

3d graph

3d images to genratie to web(html)
Sample found (linkedin questions).

data d3d;
do x=-5 to 5 by 0.1;
y=rannorm(0);
z=rannorm(1);
output;
end;
run;

ods html file="c:\temp\junk.html";
goptions reset=all dev=java;

proc g3d data=d3d;
scatter y*x=z;
run;

ods html close;

online documentation to be checked. This would be simplify generating graphs.



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

Gis, Maps Locations, kml google maps

gis kml
There are, well known, geo world wide map information systems: route planners, advertising with location is no free information, or very scattered in local national markets.

I´m missing sources of geo information outside the Netherlands. The naming of institutes and all regulations are more or less National. When I see or find something I will document it.


google maps - kml

Usage of maps is very informative. SAS is offering to genreate every kind of file including kml files. SAS has a component SAS/Graph it contains al lot of geo-maps. Pojections exist for a lang time. The question is the conversion of some data and connect it to the maps.

coordinates can be in:
  1. wgs84 (google maps), or similar
  2. radials (SAS maps)
  3. sphere measures in 360°. This can be easily divide in half & third. Normally this is used.
    natical in minute/seconds. 10.000 km / (90° * 60 minutes) giving the 1852 meters the nautical mile (1 minute).
  4. sphere measures in 400° (fr). This is set up 100° as right angle. At equator apx 10.000 km. Earth is 4-times this angle, giving 40.000 km
  5. meters / miles on a projected map. Mostly locally defined.
link sas


great circle

Distances

On a sphere the circle distance is not a line on a map
Using difficult formulas like: Haversine_formula .

On a protected map flattened as it is, an xy-plane will do. In a small country like the netherlands (200/300km) by the flattening the differnce is about 10 meters.

Distances are measured in absolute lines. Travelling is about using routes and the route with using a vehicle is giving an expected time.
Planning routes is optimizing the time (minimize) from starting to end point with possible intermediate route-points.
This is a complete different and far more diffcult question to solve.


I am measuring distances in meters (european). When at Sea I am using (sea)-miles. Other countries are using land-miles. A lot of number conversions.

SAS has buils in new features (92) as the locations get that much attention. geodist is the calculator.

Just for the US the zip-codes are handled. zip codes maps is a source the have zip-codes renewed, findindg maps. The zipcitydistance fucntion is giving the distance immediately.
That is a little bit pity for european asian and all other people. Codes like Zip are used everywhere in the world.



Using SAS Maps

Proc Gmap - Eguide
using-3rd-party-shape-files-to-build-map-charts-in-sasb



link sas

Geo information in the Netherlands

GEODB has a table of GEOBREED with zip-codes(postcode) the locations connected to the Dutch projection.
The church of Amersfoort is the point of referentials in this projection.
To be able to build the maps with it some knowledge is needed.


A lot of personal information can be found. An entry to start can be zip-code (postcode). There are many regulations to personal information. They may not not be used freely.

link Geo information Dutch related
Rijksdriehoeks triangulation
Geo-visualisatie/Vervolg_GIS
gis.startpagina.nl
rdinfo.kadaster.nl
 
link Geo information Dutch sample
252-2008 kml google maps dutch
 



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

Statistics

complex Using statistics is a difficult item because to be a Statistician is a high level skill (master acadamic). It is strong related to mathematics Mathematical_statistics

Many procedural tests are requiring a certain level of accuracy and reliability. That is why you don´t code the bascic yourself. Instead of that using testsed Software bought from the market like SAS.

The CDISC area ia the clinical helath area, getting much attention with statistics.

Statistics documentation see: SAS classic stat , User guides. Containing introductions


row-level Functions, data preparation

believe

row-level Functions, data preparation

datastep - sql
Getting the clean and confident is the first step of analyses.
Many statiscal fucntions exist on row level. They shoul be used with clear understanding. It is possible to do 3GL coding with SAS. The statistical procedure are meant for analysis (5 GL).
At row level basic mathematics should be done. A mean of a converted variable fucntion can be the necessary starting point.
LAG functions are very nice to take the meaning of previaous events.. (time series)

Transposing data
When rown and columns should be exchanged to get processed by an other steps.

Statistical functions
Instead of much basic coding yourself, using standard methods of Keywords and Formulas - Simple Statistics how the basic statistical functions are calculated (93 proc).

Statistics documentation see: SAS Base & Procedures
believe

Text handling

Text handling has become advanced with Perl regular expressions. These are the PRX- fucntions. prxmatch regular expression is one of them.
A list of documentation and proceedings: perlintro.html#Regular-expressions (perl.org).


believe

Finance

Instead of much basic coding standard methods of financial Functions calculations have become available (93). These are the same as found with many others tools like Excel. They are standardized by the Oasis institute named as " OpenDocument Formula Specification ".




Statistical & mathematical procedures

believe

Statistical & mathematical procedures

Using Statistical procedures
These are set up expecting a dedicated lay-out of the data. The problem is more getting the data prepared to be get it usefull processed.

Additonal information
The Glossary and mathematics statistics area chapters contains a lot of links that can serve as a starting point.
The
descriptive statistics paragraph is containing links to data collections. Thery are often collected at state responsiblity.




believe
Using procedures (ETS OR)
SAS/OR (Operations Research) was (is) a collection of several procedures. The Users guide have been split up since 9.3 in:
ETS as part:
Simplify the work





believe

Building procedures, Matrices (IML)

Optimize the work




Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom

Mining , operational research, web personalization

geniaal handys Miner as part:

See also: SAS advanced stat User guides.


Data analyses

believe

Presentations Sastalks, datamining

UsingDataMininginForecastingProblems
believe

Multiple Imputation for Missing Data

Within data preparation missings can be a problem. The multipleimputation is al solution to get all informtion passing to the analytics stage. Dropping the data is an other option.



Customer concepts

believe

Customer Analytics

This a whole dedicated area of connecting all kind of data tools analytics getting to business goals. Not many paper are easy found at SAS: More product or client dedicatded:

The OR license (operational research) is containing some procedures.

389-2008 SAS(R) for Real-Time Applications. A new product SAS(R) Real-Time Decision Manager 5.41 (sysreq) = 9.3 And Again renamed to Decision services (release 5.5 2012)
choose

Recommendation engine

This is generic data mining question (follow link header). Every time the business wants it it has to developped or reviewed.
With SAS is building of the analyses is possible.

Translation of the generic words is needed.



To the web

believe

Personalizing Web Content

Communicating by web wil get other challenges.

believe

Running Miner in batch mode

Normally Eminer is menu-driven. Having a data-change the question can be run at it scheduled (batch) or as a StoredProces.



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Miscellaneous     Machine special    top bottom

Miscellaneous

geniaal handys




believe

Stored process define

By default the code is included in the SP defnition. It is possible to use a %inc (sas code) or a AF call (scl code).
I those cases:

If you do not have any output (proc report/print/gchart/etc) then you do not have to use the %STP macros. See: without-a-report (TA) .

light 1

Error handling


differences interactive/batch:

Options to change default behavior and advanced error handling like checkpoint restart are available. Error Processing in SAS (lrcon) .

light 1

MP connect parallel processing

Parallel processing can save Wall-time. The time to wait on results.
Mp-connect X-commands, grid, scheduling are all posiblities.

light 1

Date time moving windows

Selective datetime windows defined on the (simulated) current date can be very handy.
A dedicated formatting can be needed as rquired by an exterenal DBMS

proc format;
    picture fmt other='%Y-%0m-%0d %0H:%0M:%0S'(datatype=datetime); run;

%let mnthstrt = %sysfunc(intnx(dtmonth,%sysfunc(datetime()),-1,b),fmt21. );
%let mnthend = %sysfunc(intnx(dtmonth,%sysfunc(datetime()),-1,e),fmt21. );



light 1

Date time selecting

I have a directory that holds resulting datasets with different dates:
I am looking for suggestions on how to identify and read the most recent dataset into other processing

Using proc datasets (not completed).
proc datasets ;
   contents data=LIBREF._all_ memtype=data out=a ;
   quit;
data v (keep=dataset crdate);
   set a;
   by memname ;
   if first.memname;
   dataset=strip(libname)||'.'||strip(memname);
   run;


As you're having a date string in the table name I would use this information for table selection as done in below code. (Patrick Matter)
data work.TestFile_15Jan2012
   work.TestFile_19Oct2012
   work.TestFile_25Jun2012;
   var='something';
   run;

proc sql noprint;
   select cats(libname,'.',memname) into :current_table from dictionary.tables
   where libname='WORK' and upcase(scan(memname,1,'_')) = 'TESTFILE'
   having input(scan(memname,-1,'_'),date9.)=max(input(scan(memname,-1,'_'),date9.)) ;
   quit;


data want;
   set &current_table;
   run;



light 1

Mail processing

Eguide
outlook data You can open up you MS-outlook mail as a data-store. All mail content etc should be retrievable. Just open outlook from data-source...
outlook data
With attachements there is indicator weather it is present or not. Attachments are requiring an other program to process it.

outlook profile Outlook must open the default profile as configured in the Outlook/Windows setup. To startup automatic the right outlook mail/user-folder. Search, google profile e-mail when not known.
Other interfaces (base)
Searching reading Outlook mails is giving many hints. VB Access SQL-server all kind of tools of microsoft are able to process it. Connecting Outlook to a table (access) is opening a way to access is by ODBC.
SMTP IMAP VIM
Sending mail programmatic




believe

Magic string

Sometimes your source-code get confused by unbalanced tokens. Your session won't react any more.
Try to close an unbalanced token. When nothing helps restart your session and debug the code.
Magic string(s):
;*';*";*/ *))%*))*/; ;;;; %mend; options notes; run cancel; quit;

The comment indication (*) is just a starting sign, it stops at the end of line.
You can follow this comment with every unbalanced close token.

believe

Names - rules - length

In SAS language names are part of the concepts. Within releases some things like maximum allowed lengths are changing.
lrcon Names in the SAS Language - maximum length.

believe

quotes - quoting

In SAS language single and double quotes are used to work with strings. The difference is within macro-processing, double quotes resolve - single quotes no resolving of macro-vars.
mcrolref macro language reference - using macro vars .

believe

DOW loop - hashing

Dorfman Whitlock uses until.last

Hasing has its goal of accessing data fast (indexed table lookup) by using internal memory. kb/24/667 sample load data to hasing (memory). Using the hash iterator (lrcon).

SASFILE Statement (lestmref). Statement wil get the entire dataset into memory .
believe

Directory Listing

New functions in 93 are: Dopen Dinfo Doptname. Also you have Fdelete
With these functions there should be noe need to uses the x-cmd or piping methods anymore.

With wildcarding at the input filename there as an easy way to get all files processed in a datastep. See: 166-2008 infile and file (global forum).

handy

Cobol - Mainframe PIC copybooks

Not the modern SAS art. It is of the very old times.
It can be very handy with conversion trajects:




save light

Select by no observations

Many logical correct solutions exist. When performance matters reading just a small part will make the difference.
Solving keys: firstobs obs
Point
Using the point point function will only read the datasets
data lastfew;
do point =nobs-9 to nobs by 1;
set sashelp.class point=point nobs=nobs;
output lastfew;
end;
stop;
run;

Art Carpenter: For most data sets it probably does not matter, however for very large data sets you may not want to read all observations and then throw away all but 10 using the subsetting IF.
getting nobs in SQL
proc sql;
select nobs-9 into :last10 from dictionary.tables
where libname='SASHELP' and memname='CLASS';

select * from sashelp.class(firstobs=&last10);
quit;
run;

The disadvantage here is the access to dictionary.tables reading all inforamtio of tables before doing a selection.

getting nobs by %sysfunc open attrn closeL
%let dsid=%sysfunc(open(one));
%let num=%sysfunc(attrn(&dsid,nlobs));
%let rc=%sysfunc(close(&dsid));
%put There are &num observations in dataset one.;

kb25078
save light

Performance by user choice

class / By processing, macro-s
Generating sas-code by macro-s is the most slow option.
By processing wil run a dataset once. Is requiring a sorted/classified dataset. Within simulations you will have classes ordered
Class processing is elminitating the need for sort step
Blogs macros or not.
Simulation in SAS: The slow way or the BY way (Rick Wicklin|July 18, 2012)


believe

hashing - io memory

Hasing has its goal of accessing data fast (indexed table lookup) by using internal memory. kb/24/667 sample load data to hasing (memory).
Using the hash iterator (lrcon).
Dictionary of Hash and Hash Iterator Object Language Elements (lecompobjref).

SASFILE Statement (lestmref). This statement wil get the entire dataset into memory .

save light

Source , reverse engineering

With SAS you can uses macro-s to imbed source. It will change the code every time. Reverse engineering out of log-files is not possible.

Scaproc is to analyse your source.
Eguide wil reformat the hand made code.
See the
blogging / docs paragraph "new goodies Eguide". A lot of options to analyse, adjust, change code.


To get a SAS source back from a SAS-log is executed in Unix by:
sed '/^[^[0-9]/d;/SAS/d;/^$/d;/DATA?:?:?/d; /\*[0-9]/d' --saslogbestand.log-- |cut -c6-256 | tr ! ' ' |sed '/^[^ ]/d'> --output.sas--

Instruction: Fill in this statements, your name of the saslog-file and the destination sas-source.

More ways to get source code out of a SAS-log:
nesug 2006 po18 convert log to sas code




believe

Enhanced Editor (base) - KMF files

The enhanced editor (EGuide/ Base 913) can do many tricks. note: 19/335 keyboard macro-s, 077-2009 can be distributed.
wiki/Abbreviations/Macros sascommunity

Eguide promises to have this features as standards in the new versions

believe

Automating commands

DM Statements Unveiled Agarwal
Repeating actions into daily work.

believe

tracing progress - performance

Classic: follow the mesages in the sas-log.
For performance reasons messages to the sas-log and all updating should be removed.

sysecho: Sysecho Eguide Following progress. (92 and up)

Arm: 9.3 Arm Why?     9.3 adding Arm to an application.

believe

terminal, connections

If you run a server SAS session interactively, the SAS session assumes that, by using a dialog box, you can resolve any problems that it encounters. While the SAS session waits for a response to its query, the server might not be able to continue to service client requests until the query is answered. However, you might not be aware that a response is required if the window in which the server is running is not visible or is not being monitored. Therefore, it is recommended that you specify the SAS system option NOTERMINAL so that SAS does not display dialog boxes and performs whatever is required without prompting.
In background processing the default is stopping the process.



Links to other blogs & docs


geniaal tips blogs

Tips Blogs

Information by blogs, sites etc.


Eguide blogs - tips
145-2012 SAS Enterprise Guide 4.3: Finally a Programmer´s Tool navigation, Flows, lagacy code
306-2011 Working with a DBMS using SAS Enterprise Guide sastrace sql_ip_trace readbuff, in-database best practices
eguide split screens banana (CH)
Eguide Tips organizing projects (CH)
Reconnect Query Nodes to New Input Data organize (amadeus)
Copy files anywhere not syscopy (CH)
"Files" Folder direct access (Steve Overton)
"Files" EG 5.1 Data Exploration (TA)
the-autoexec-process-flow native libnames (CH 2011)
3 Ways to Use the Note Feature (TA)
eguide sysecho progress sas program (CH)
137-2010 New Goodies 4.3 scaproc, syntax suggest,
tip-help (AB,CH,SS )
noautoscroll performance (amadeus)
Office Excel blogs
methods export excel (C.Hemedinger)
got easier export to excel (CH)
eguide import excel (Tricia Aanderud)
247-2012 import export excel (stratia)
065-2012 using java to preformatted excel instead of DDE
150-2012 An Introduction to Creating Multi-Sheet Microsoft Excel Word
TASS-2011 10 Most Frequently Asked Questions of Exporting to Excel
144-2010 Choosing the Right Tool from Your SAS and Microsoft Excel Tool Belt
TASS-2011 10 Most Frequently Asked Questions of Exporting to Excel
using powershell data-set-viewer (CH)
behind the scenes importing excel (CH)
Do I have to choose between Excel and SAS? (Charu Shankar) februari 2012


BI web blogs
sas-portal-web-report-studio-easycustomizations (Tricia Aanderud) -2012
creating-custom-di-transformations-using-di-studio-macro-variables (amadeus) -2012
stored_process_minimalist_programming (Quentin McMullen)
%include usage for code - 2012
3 Tips to Improve Your Prompts (TA) Stored processes - may 2012
Using Regular Expressions When Editing Code in Enterprise Guide (amadeus) - feb 2013
base blogs
character translate-remove data cleaning up (Mark Jordan)
roll-your-own-function fcmp - extra ordinay fucntions (MJ)
check-the-log errors/notes (TA)
Capture New Twitter Followers (outlook) Myth Busted: You can Code in EG (TA)
138-2010 PROC DATASETS; The Swiss Army Knife of SAS(R) Procedures


Analytics blogs
Finding patterns in big data with SAS/GRAPH transparent markers 9.3 (Robert Allison)
6 questions with data mining expert, John Elder (Michele Reister) januari 2012
Miscellaneous generic
176-29 Communication-Effective Use of Color for Web Pages, Graphs, Tables, Maps, Text, and Print (LeRoy Bessler)
Color me happy (TA 2013 feb all analytics)
Miscellaneous to OS
PC SAS Programmer Facing UNIX? SAS Enterprise Guide to the Rescue (TA 2013 mar)



Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Miscellaneous     Machine special    top bottom

Machine specific options & behavior




SAS logo

SAS quirks


Language Settings , CSV - decimal point
In the old comma seperate file there was always the problem:
Behavior issue SAS V9.3 bij PROC EXPORT met DBMS=DBF. 93 Not exporting data as expected.
Interface to PC Files: 9.3 Reference - dBase DBF Files Essentials When by the language setting the decimal point is changed. DBF should by corrected to use .
options ctrydecimalseparator='.';


Access Databases
DB2decpt option (acreldb)

Alignment
ODS Decimal_align (odsug)

MBCS DBCS SBCS
Internationalization Compatibility for SAS String Functions (nlsref)
Contains a list of functions limited to DBCS or SBCS. With unicode (MBCS) care should be taken.



linux logo

Unix


Personal Settings
believe Alle personal settings (SAS datasets, SAS registry) are stored unde the users "home". The users "home" is the normal Unix defined location set as: "/home/<Business-Department>/<User-Key>"

A short name fo accessing this locations us using ~(tilde). "cd ~" is always the persanal location. SAS is using a convention like this. No fully name defined?. Starting with the personal home as the first part to get a full name.

Definining a directory like "MY_SASfiles" is giving the location for all kind of files (personal or deparment): sources, scripts, macros, datasets.
Doing this "<Business-Department>" and "<User-Key>" needing to support in naming conventions and start up convetions:
Personal Settings , Unix no support
like hell Having convinced the "/home/---" location is the correct place to work with?

To do: Mountpoints to organize the ammount of storage are needed.
Place them at "/home/<Business-Department> and you have de responsible <Business-Department> to deal with (accountable/chargeable)

The difficult Unix support is not used to work at goals with Busisness like this.
Arguments: Technical easy to solve. Very difficult to solve this when:
a/ not having management support the requirements need to be met
b/ no cooperation at OS level (Unix) is present.




Windows logo

Windows


Personal Settings
believe Where to find, how to maintain. Windows is keeping somen environment variables. In a dos-box (cmd.exe) the set command gives information about your session.
ALLUSERSPROFILE=C:\ProgramData
Your settings APPDATA=C:\Users\.....\AppData\Roaming
HOMEDRIVE=C:
HOMEPATH=\Users\.....
Your settings LOCALAPPDATA=C:\Users\.....\AppData\Local
NUMBER_OF_PROCESSORS=8
PUBLIC=C:\Users\Public
Your temp TEMP=C:\Users\.....\AppData\Local\Temp
your temp TMP=C:\Users\.....\AppData\Local\Temp
USERNAME=.....
USERPROFILE=C:\Users\.....
windir=C:\Windows
believe
Windows clipboard
Can be used with filename / file (93) 93 clipbrd reference
 FILENAME fileref CLIPBRD ; 


believe

Windows modules calls

Windows Unc naming of driveletter calling WNETGETCONNECTA
Can be used with filename / file (93) The Path, The Whole Path, And Nothing But the Path, So Help Me Windows
Windows date notation TypeGuesssRows by registry (xcmd)
Registry retrieve windows-reg-query-from-sas (CH 2012 may) See also installation/hardening chapter

believe
Eguide Xceed
Error message with: xceed.filesystem.
eguide xceed VB .Net error Searching for this text is leading to xceedsoft. Xceed.FileSystem.StreamFile   Zip_NET_Intro xceed.com

The error in this case is indacting problems storing or retrieving the Eguide project.

The Eguide projects are looking to be build as Zipped files. The same as -.docx (word), -.xlsx (excel), -.pptx (power point).
The technical call is something like "getfilefromzipfile".. Xceed zip is tooling to be used wiht VB and .Net to be used with zip-files FTP and a lot more. It explains the very advanced fucntionality in Eguide.




Mainframe img

Mainframe

Personal Settings
believe As SAS was designed with the personal approach it is sharing the TSO (Time Sharing Option) of the mainframe (classic approach).
As Using SAS on a mainframe is requiring TSO, you can check this part in case of problems. Giving better error-messages or other indications as is possible with SAS client usage.

SAS needs a profile-dataset (SASUSER) commonly placed at the USER-prefix. An Alias (TSO) must be defined to be able having personal datasets.

As a difference to others computer systems:
3270 terminal usage
complex 3270 is the protocol used with IBM mainframe terminals. Most people are remembering just the old day-s 24*80 and green.
SAS can be used in 50*150 screen with graphics and mouse. It has his own basic layer of 3270 interfaces build to do the same thing on all machines.

Some issue using 3270 are:
keys on mainframe - CUA - personal
confused CUA Common Users Acces was a IBM standard to define the way keys should be used with logical fucntions. Much of it has become common approach.
The 3270 and mainframe behaves somewhat different as pc-s. The scrolling keys have a different meaning, scrolling in the visible area on the screen, not scrolling in the logical data.
For scrolling the following keys are used:
Settings of kleyboard lay-outs in SAS is optional.
Saving personal settings of keys is possible.
Advice: Remember these settings as they can be lost at migrations (SASUSER)




Virtual Data    Encoding    Measure/Category    Floating numbers    SQL    Cube    Graphics    Gis    Date-time    Statistics    Data Mining     Miscellaneous     Machine special    top bottom
home-SAS    SAS-SAAS    First steps    Installation    Hardening    Operational    Using    My Notes

© 2012 J.A.Karman (16 feb 2012)