This is an old topic but I learned more and come up more perspectives over time.
- Raw Data vs Clean Data
- What kind of services are required?
- Data as a Service
- Analytics as a Service
Posted by Dylan Wan on April 7, 2017
This is an old topic but I learned more and come up more perspectives over time.
Posted by Dylan Wan on April 1, 2017
RTI stands for Response to Intervention. It is one the Data Driven Instruction methods. It is a way to monitor students’ progress and to determine what interventions students may need, assign the appropriate intervention to them, and monitor the progress closely.
This is a good demonstration of the need of actionable BI. The system collects the information about students and identify the students that need the interventions. The information may be learning related or behavior related problems. Read the rest of this entry »
Posted by Dylan Wan on February 16, 2017
An analytics apps should be an application. We should not see it as a report solution or just a way to view or download data.
A best analytics apps is an apps, and lets people forget about it is an “analytics” apps.
If you keep the end user in mind and drive the design of the apps using user persona and knowing really what users do, the line between analytics apps and other apps should not be very clear.
Posted by Dylan Wan on February 16, 2017
A dashboard is a data driven application. It is important to think about:
Posted by Dylan Wan on January 29, 2017
I am trying to define what are the criteria for a schema-less, or a schema on demand data storage.
In relational database, we use DDL, data definition language, to define schema.
We have to create table first, before we can insert data into a table. When we update data, we update the data by mentioning the name of columns. We use DDL before we use DML.
We have to know the name of columns in order to construct the SQL statement with specific select clause. If a column does not exist, the system throws an error when we try to query. SELECT * FROM does not have the requirement. CREATE SELECT also kind of giving us some tolerance, but the table it creates will be unstable. INSERT.. SELECT * will be a bad practice as when the source schema changes, this statement becomes broken.
Schema describes the name of table, the names and orders of the columns, and the data type (or domain) of the columns.
Here are what I feel possible and something we can pursue: Read the rest of this entry »
Posted by Dylan Wan on January 26, 2017
An interesting concept.
Posted by Dylan Wan on January 24, 2017
This post comes from notes I took while learning Elastic Stack.
ELK is a combination of three tools:
When ELK is changed to Elastic Stack, there is a forth tool:
There are a lot of information on the net. I do not want to repeat anything. I will just write my impressions.
In this Elastic world, data are represented, passed, and stored in JSON format. In our relational world, data are represented, passed, and stored in tabular format. Data in the tabular format can also be easily represented in the JSON format, but not the other way. Data stored in XML or JSON can be nested and to store the normalized format in the relational database, the information about the relationships are lost or only exists in the form of metadata and the keys, PK and FK.
Elastic Search is the engine in the picture. However, the real engine is Apache Lucene. Elastic Search was built on the top of Lucene by adding two things: distributed processing and RESTful APIs. Elastic Search / Lucene is the database (index) engine, which organizes the data and process the data like the CRUD (Create, Read, Update, Delete) operations in relational database.
We use SQL for CRUD in relational database. We will use the query language in RESTful API for similar purposes. To use Elastic Search, you just need to put the data in and you can get the data out by searching. Knowing how it works internally should help for optimizing and provide a scalable solution, but may not be required.
Logstash was introduced to me as an ETL tool. However, my impression is that it is a specific, not a generic, transformation tool for handling the log information. It provides the specific capability to parse the log files. They call this “grok”. The technology is built on the top of regular expression. Other impressive features (plugins) are “geoip”, which is an enrichment service, and “date” processor, which can parse many if not any date formats.
The beauty of this tool is extensible. Like ODI, which can be extended by adding Knowledge Module, Logstash can be extended by adding Plugins. R is also benefited from the R packages contributed from the community.
Kibana is the BI/visualization tool in the picture. It issues the queries to the engine (elastic search), get the data and display the data in charts or tables.
It seems that it emphasizes on time series analysis as it is commonly used in log analysis. Otherwise, most of charts and tables are just like other similar tools.
These are source side agents for collecting information. It extracts the data on the source and deliver the data to the target.
Posted by Dylan Wan on January 16, 2017
HR is an important business function in any company. It supports business growth. It also satisfies the compliance requirements. Reports are required for supporting the HR functions.
HR reports are surrounding the HR function: Human Resource Planning, Recruitment and Selection, Orientation and Training, Performance Appraisal, Compensation and Benefit.
They serves many different roles and thus having different ways to slice and dice the data.
I found that HR reporting is somewhat challenging:
Posted by Dylan Wan on December 20, 2016
It was a challenge when we tried to build a BI application for Fusion Cloud application as Fusion Cloud applications, unlike those acquired solutions, such as RightNow, Elouque, and Taleo, do not have web services at that time.
It was the reason why Oracle BI Application Configuration Manager was introduced. It fills the gap by building an agent at the Fusion cloud application side. The agent talks to Fusion Apps like how OTBI talks with Fusion Apps. It queries the physical layer of the objects from Fusion Apps and download the data into csv files. It is not a web service based solution but for building a custom BI that requires bulk access any way, it is a goo choice.
Once the data is downloaded from Fusion Apps as CSV files, it can be accessible via FTP. Here is the documentation about configuring Oracle BI Apps but I guess that it will work for other FTP client as well.
It seems that the situation improved and now multiple alternates are available.
Another possible way is to use BI Publisher. If you have the BI Publisher Data Model Developer role, you will be able to “data model” in publisher. Here is the documentation.
Posted by Dylan Wan on November 26, 2016
Here is a check list to enable a data source for BI:
1. How does the service perform the authentication?
In the case of ServiceNow, they provided a good example:
curl "https://instance.service-now.com/api/now/table/problem?sysparm_limit=1" \ --request GET \ --header "Accept:application/json" \ --user 'username':'password'
The username and password can be passed as part of the request.
In some other system, the authentication is done via a separate login service call, and the login call will return a session id and sometime a cookie.
From the deployment planning perspective, we also need to think of how this user used for accessing the REST service will be created and what permissions should be given.
2. How to get the schema definition?
2.1 A list of tables?
Different applications may organize their data in different ways. We need to know the schema definitions. It may have SCHEMA -> TABLE -> COLUMN structure like how the data are organized in a RDBMS, such as Oracle. In the case of ServiceNow, they just provide a list of table. For Oracle Service Cloud, previously known as RightNow!, they have a good documentation about a list of objects they support: RightNow Core Objects.
Are custom tables allowed? How do we get the name of the custom table? Do you have a separate data dictionary services?
2.2 A list of columns?
Getting the data dictionary for the data is important as you can define the mapping and define the query definitions. You do not need to bring all the data. Knowing which data available is important. We need to define what we need and how to select them or how to filter out the unwanted data. For example, ServiceNow provides the WSDL supports :
It provides the column and data type of each columns:
2.3 Data type mapping
Some source systems have their own special data type. A data type mapping may be required for handling the data from the new data source.
2.4 Nested Structure
JSON and XML format allows nested structure. A transformation service may be required for breaking the data from one “document” to multiple “tables”. If this type of requirements exist, the data will be staged first before loading.
For example, in Oracle RightNow, they have a special data type call “Menu Type”. It basically provides the Name to ID conversion. If the service API allows to retrieve both, it may be easier. You can just retrieve both. If the service API allow to retrieve ID only, you will need to either perform a JOIN when retrieving the data, or pull the lookup data separately.
Sometime the Lookup service is available as a different service call.
2.6 Key of the tables
3. Getting the data
Here are some of the considerations:
Posted by Dylan Wan on September 18, 2016
When we think of a data warehouse, we are typically thinking of having a database that store the data and need to design the schema and the ETL program to populate the database. If we take out the “database” from the above description, will it still be called a data warehouse?
Posted by Dylan Wan on March 17, 2016
Data warehousing is really about preparation of the data for reporting. The assumption are:
The future of data warehousing is related to whether the above assumptions are still true. Other factors are relating to technologies and the source data available. Read the rest of this entry »
Posted by Dylan Wan on December 16, 2015
Team-based security is referring to a specific data security requirement scenario.
Team-based security means that the object being secured has a “team” associated with it. The team members can access the objects. Read the rest of this entry »
Posted by Dylan Wan on November 30, 2015
The definition of multi-tenancy varies. Some people think that the tenants are data stripping and adding a tenant ID to every table is to support multiple tenants.
Posted by Dylan Wan on November 16, 2015
You need to use UNION when you would like to combine the data sets as a single data set. However, when do you exactly need to do this type of thing in BI or in data warehouse ETL? Here are some real business cases:
Posted by Dylan Wan on November 10, 2015
ODI Topology allows you to isolate the physical connection and the logical data source by defining the physical schema and logical schema.
This object may be seen as redundant during development. However, it is a very useful feature for supporting the Test to Production (T2P) process. Read the rest of this entry »
Posted by Dylan Wan on October 30, 2015
Almost all data warehouse have a date dimension. The purpose of the date dimension is to provide some pre-calculated grouping for dates. It helps rolling up the data that entered against dates to a higher level, such as year, quarter, month, week, etc.
In some system, source files are used in generating the date dimension. IMHO, it makes the modification to the logic difficult. In some ETL programs, the task involves various table joins, try to generate the rows for the year range.
This post is for describing how to populate a table with rows for each date for a given year range. Read the rest of this entry »
Posted by Dylan Wan on October 21, 2015
I just add a post about using initialization block in OBIEE.
It is a feature I felt excited when I first see the tool. Read the rest of this entry »
Posted by Dylan Wan on October 17, 2015
Here is a list of features available from Amazon QuickSight:
|Data Source||Connect to supported AWS data sources|
|Data Source||Upload flat files|
|Data Source||Access third-party data sources|
|Data Preparation||Data Preparation Tools|
|Visualization||Access all chart types|
|Data Access||Capture and Share, Collaborate|
|Data Access||API/ODBC connection to SPICE|
|Security||Encryption at Rest|
|Security||Active Directory Integration|
|Security||Fine-grained User Access Control|
|Security||Enable Audit Logs with AWS CloudTrail|
|Performance||In-memory calculation with SPICE|
|Performance||Scale to thousands of users|
|Performance||Support up to petabytes of data|
I categorize the features into these groups:
They are almost same features available from other BI tools, such OBIEE, except the in-memory engine, and perhaps the scalability. Here are some questions I have. Read the rest of this entry »
Posted by Dylan Wan on October 14, 2015
Data Mashup is a new feature from OBIEE 12c.
It is one of the two main features that OBIEE 12c. The other one is the visual analyzer.
When I tested the data mashup features, it supports these two scenarios. Read the rest of this entry »