Data Analytics Simplified – A Tutorial – Part 5 – The Data Analytics Process
By Kato Mivule
The data analytics process involves using algorithms to extract, mine, and discover meaningful information patterns and knowledge in data .
However, the data would have to undergo a series of transformational processes before meaningful information patterns can be extracted.
There are five main phases in the data analytics process     :
- The first step of the data analytics process is to articulate the problem and questions that need to be solved and answered by the data analytics process.
- The question or problem to be solved has to be domain specific.
- This helps with the correct data selection process. For example local grocery stores might want to use traffic pattern data to predict when customers with cars are most likely to stop my a certain store.
- The second step of the process is to select an appropriate dataset based on a specific domain.
- The third step is to transform the selected dataset into a format that could most appropriate for analytics algorithms.
- This process is also called data cleaning, in which missing values are removed or replaced with averages. Data outliers maybe removed or replaced with appropriate values. Data with different data types are also corrected at this stage.
- Data analytics work involves a considerable time preprocessing data to ensure correct analysis.
- Data from different sources is then converted into a common format for analysis.
- This could include reducing the data into appropriate sample sizes, adding labels for classification and changing file types to make the data suitable for analytics tools.
- In this phase, appropriate data mining and machine learning algorithms are chosen.
- The analyst then could make a choice to employ supervised or unsupervised learning algorithms.
- In some cases, depending on the problem formulation, both supervised and unsupervised learning algorithms will be chosen.
- However, parsimony is important – keep it simple. You don’t need to implement all algorithms to extract meaningful knowledge from the data and come up with a correct model.
- In this phase evaluation of results produced by the data mining algorithms is done.
- The extracted knowledge is then presented to the stakeholders in a clear manner.
- Visualization of results is done at this stage.
- A report analyzing and interpreting the results to convey meaning is done at this stage.
- Again parsimony is important. A concise and understandable visualization of results is preferred.