Data Analytics Simplified – A Tutorial – Part 3
By Kato Mivule
Keywords: Supervised Learning, Unsupervised Learning
Data analytics can be broken down into two main categories as we saw in the previous post, predictive and descriptive data analytics. Furthermore, data analytics tasks can be categorized as follows:
Supervised learning tasks: This involves algorithms that group data into classes and make predictions based on previous examples – thus learning by example . In other words data is grouped into predetermined classes based on previous history of categorizing the data. The data in supervised learning is always labeled (classes) and basically divided into:
- Training data – which is used for setting up examples on how to group the data, and thus create a model for future grouping of the data.
- Testing data – which is used to make predictions, based on the example data.
Unsupervised learning tasks: This involves algorithms that do not need predetermined classes to categorize the data . In such cases, the data self determines the groups or clusters into which similar data values collect together.
- Predictive data analytics methods are largely supervised learning methods.
- Descriptive data analytics methods are mainly non-supervised learning methods.
Briefly, predictive data analytics includes the following three main data analytics methods, more details shall follow later:
- Classification analysis
Classification involves grouping or prediction of data into predefined categorical classes or targets. The classes in which the data is grouped are chosen before analyzing the data based on the characteristics of the data .
- Regression analysis
Regression involves the prediction or grouping of data items into predefined numerical classes or targets based on a known mathematical function such as a linear or logistic regression function .
- Time series analysis
Time series analytics involves examining the values of a data attribute that have been recorded over a period of time, in a chronological order. Predictions are made based on the history of the values observed over time  .
Descriptive data analytics includes the following unsupervised data analytics methods, more details shall follow later:
Clustering involves grouping data into classes but without any predefined or predetermined classes and targets. The classes in which the data is grouped are self-determined by the data, such that similar items collect around the same clusters .
Summarization is the generalization of data into groups that are related with descriptive statistics such as the mean, mode, and median .
- Association Rules
Associative rules involve using a set of IF-THEN rules or functions to categorize data with similar relationships in the same groups .
- Sequence Discovery
Similar to time series analysis, sequential discovery or sequential pattern analysis, involves finding the related patterns in sequential data – that is data in chronological order, based on statistical properties .