Devops Data - practical data cases

🎭 index references    elucidation    metier 🎭
👐 top    mid    bottom   👐

⚙  bpm   sdlc   bianl   data   meta   math   ⚙
⚖   Intro   perftun hardware   DI control   transform flow   schedule flow   What next   ⚖

Patterns for quick good realizations

Data building, data transport, coding.

Building data design data devops meta devops bianl design meta design math The data explosion. The change is the ammount we are collecting measuring processes as new information (edge).

📚 Information requests.
⚙ measurements monitoring.
🎭 Agility for changes?
⚖ solution & performance acceptable?

🔰 Too fast .. previous.


Reference Topic Squad
Intro Data building, data transport, coding. 01.01
perftun hardware Performance & Tuning - Software, Hardware. 02.01
👓 Performance OS Performance OS level 02.02
DI control Data Integration - Control & performance. 03.01
👓 EL constructs Etl constructs & performance 03.01
transform flow Transformations - data lineage, 04.01
👓 Data flow Data lineage & xml, json 04.01
👓 NLS National language support 04.02
schedule flow Scheduling, planning operations. 05.01
👓 Scheduling Scheduling 05.01
What next Change data - Transformations. 06.00
Following steps 06.02


Duality service requests

From the organisation (bpm) there two goals for their solution questions in improvements, those are:
  1. the core business process (sdlc)
  2. for reviewing governing the business process (bianl)
Solutions for those goals are different altought some tools could be the same.


Performance & Tuning - Software, Hardware.

Solving performance problems requires understanding of the operating system and hardware. That architecture was set by von Neumann (see design-math).
basic resource cooperation & dependicies
A single CPU, limited Internal Memory and the external storage.
The time differences between those resources are in magnitudes (factor 100-1000).

Optimizing is balancing between choosing the best algorithm and the effort to achieve that algorithm.

new resources & dependicies
That concept didn´t change that much. Neglecting performance questions could be justified by advance in hardware the knowledge of tuning processes is ignored. Those days are gone.

👓 The Free Lunch Is Over .
A Fundamental Turn Toward Concurrency in Software,
By Herb Sutter. (2009)
If you haven´t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency. Now is also the time for you and your team to grok concurrent programming´s requirements, pitfalls, styles, and idioms.

Additional components, the connection from machine, multiple cpu´s - several banks internal memory, to multiple external storage boxes by a network. Storage in a network cam be a SAN (Strage attache Network) or a NAS (Network attached Storage). They are different in behaviour and performance.

There is a belief that this is not a business issue and is pure technical. That is a wrong assumption. When asking for BCM ( Enisabusiness continuity management) a part of risk management.
Business Continuity is the term applied to the series of management processes and integrated plans that maintain the continuity of the critical processes of an organisation, should a disruptive event take place which impacts the ability of the organisation to continue to provide its key services. ICT systems and electronic data are crucial components of the processes and their protection and timely return is of paramount importance.
Business applications and their performance is the other reason to do this by good metrics in lead by the business organisation.

Data Integration - Control & performance.

Extract Transform Load is the old classic way for dedicated datawarehouse having the only goal of delivering reports (dashboards). A more practical approach is Extract Load just for a segregation of hardware resources.
Performance Data processing
Performance is impacted by:
For managing tables like a DBA there are dependicies on hardware level.
IBM db2 dio cio 👓 Use concurrent I/O to improve DB2 database performance (ibm 2012)
Concurrent I/O and cached I/O are features that are typically associated with file systems. Because of this, most DB2 DBAs think that the use of these two technologies lies within the purview of storage and system administrators.
However, leveraging this technology in a DB2 database environment is the responsibility of the DBA

In this article DB2 is calssified to be the "eapplication"e, that is confusing when that word is used for business logic.

ELT processing pre & post steps
Doing Extract / Load processing there are many tools due to 👓 CWM (Common Warehouse Metadata specification). However doing that in real life something is missing. That is control & monitoring.

For monitoring and control: This kind of logic is only possible by having an adjusted pre and post process in place. That logic is difficult to solve by an external generic provision, it is relative easy with local customisation using local naming conventions.
Details are found in the paragraph linked 👓 with the figure:
DI control performance

Transformations - data lineage.

Knowing what information from what source is processed into new information at a new location is lineage (derivation).
data lineage
Data lineage states where data is coming from, where it is going, and what transformations are applied to it as it flows through multiple processes. It helps understand the data life cycle. It is one of the most critical pieces of information from a metadata management point of view
It is called 👓 "data lineage". (science direct articles)
Details are found in the paragraph linked 👓 with the figure:
full pull push request service delivery
Normalisation - Denormalsation
In transactional systems it is important to avoid any duplication of an artefact, element because it is too complex to keep duplictions synchronized.

👓 Database Normalization (mariadb) refers the . The concept of database normalization is generally traced back to E.F. Codd, an IBM researcher who, in 1970, published a paper describing the relational database model.
Third Normal Form (3NF) Denormalization is the process of reversing the transformations made during normalization for performance reasons. It's a topic that stirs controversy among database experts; there are those who claim the cost is too high and never denormalize, and there are those that tout its benefits and routinely denormalize.

The classic Business Intelligence is reshaping all data into new dedicated data models. The facts and dimension used in the operational process are not suited for reporting and analyses.
The concepts of a transactional operational data design with normalization are followed, the result is a lot of transformations for tables.
What is delivered as olap or reports, is denormalised using summaries.

National language Support (NLS)
schedule job flow 👓 NlS (example is about: This all his impact on the realisation in the data processing. (👓 link details figure:) National Language Support (NLS) and localized versions are frequently confused. NLS ensures that systems can handle local language data. A localized version is a software product in which the entire user interface appears in a particular language.

Scheduling, planning operations.

Scheduling is the other part of running processes. Instead of defining blocks of code in a program it is about defining blocks of programs for a process. For building a program the word job is used. For building a process flow, having a start and end, the word job is used at the operational department. This can get mixed up but they are really different.
Building a process flow
Building a process flow (job) is defining the order how to run code units (jobs). See the figure 👓, details are in the link.
schedule job flow
Time events schedule
Running planned proces flows
Having process flow defined the planning is: In the example in the figure in the early morning before office hours process are run to do a full load of several warehouses.
The full load in this case was faster than trying to catch all changes. An additional advantage with that is missing changes will not have big impact as the longest delay for data is one day.

In the end of this project three for development purpose and 3 (support lines) * 4 (DTAP versions) = 12 loads were run within the available hours. During office hours regular updates every 15 minutes to achieve a near real time updated version.

Developing a system like this would be more easy and more understandable when the scheduling and code units are designed and build as as system.

Change data - Transformations.

A standardised location in normal processes of the information data brings normal capacity questions.
Capacity Considerations.
When the Collecting and sending area's of the EDW 3.0 are the ones most limited, the planning is best done for traffic by managing this service.
Modelling data with very detailed relationships should not be the fucntion of a datawarehouse.

More in details on the transport of data. The data flow goes:

Data warehouse 3.0
All this is requires a well supported way of governing data centrally with a business mindset. The figure shows a design conform physical logistics. 👓

Following steps

Missing link devops bianl devops meta design data design meta design math
These are practical data experiences.

MetaData Information generic - previous
bianl, Business Intelligence & Analytics 👓 next.

Others are: concepts requirements: 👓
Data Meta Math

⚖   Intro   perftun hardware   DI control   transform flow   schedule flow   What next   ⚖
⚙   bpm   sdlc   bianl   data   meta   math   ⚙

© 2012,2020 J.A.Karman
👐 top    mid    bottom   👐
🎭 index references    elucidation    metier 🎭