By Kato Mivule
There are a number of challenges any data analytics project will have to give serious consideration. Below, we identify and classify five main challenges that any data analytics specialist will always encounter: (i) the problem definition challenge, (ii) the data preprocessing challenge, (iii) the big data challenge, (iv) the unstructured data challenge, and (v) the evaluation of results challenge    .
The problem definition challenge
- Many data analytics problems are not well defined by stakeholders and therefore require a collaboration between domain and technical specialists to determine the data to be used and outcomes, and the algorithms needed to accomplish the tasks.
The data preprocessing challenge
- Most of the datasets presented for data analytics comes incomplete and not in the required format to properly apply data analytics algorithms. During the data preprocessing stage, data has to be placed in a format suitable for the data mining and machine learning algorithms.
- Missing values: during the preprocessing phase, missing values in the data have to be replaced with average values or the most frequent values.
- Noisy data: incorrect and invalid values have to be removed and replaced as well.
- Irrelevant values: values that offer no insight to the problem being solved are removed at this stage.
- Outliers: these are values that are too high or too low that they affect the overall outcome of the data analytics results. For example, a dataset containing the salary of both the CEO and Janitor might not reflect well on the average salary of workers in that organization. Yet simply removing the outliers would be a loss of valuable data, and therefore a challenge to data analytics.
The big data challenge
- The exponential growth of data on a daily basis presents an ever growing challenge for data analytics in terms of computation resources needed to analyze such datasets.
- The big data problem includes both the large datasets and the high dimensionality of such data sets, i.e., the large number of attributes or variables in a given dataset.
The unstructured data challenge
- Traditional data analytics always worked with structured data in well defined data structures such as, text, numeric, and date. However, of recent, due to the exponential growth and use of the internet, data is directly stored to data warehouses in unstructured formats that include multimedia formats such as images, video, GIS data, etc.
- Therefore unstructured data is a challenge for data analytics in that it has to be preprocessed to the right format before the analytics process.
The evaluation of results challenge
- Another challenge related to the problem definition challenge in data analytics is the evaluation of results. In both cases the user or stakeholders do not state precisely what they want to analyze and even when results are presented, it becomes a challenge to both the technical experts in interpreting and visualizing the results in a meaningful way to the stakeholders.
- Result interpretation: Results that may be correctly interpreted by the data analytics specialists might sound meaningless to clients.
- Result visualization: large datasets always present a visualization challenge, therefore the data analytics specialists are faced with presenting results in a succinct, understandable, and meaningful way to the users.
- The data analytics specialist has to find the balance between presenting too much and too little information.
 Margaret H. Dunham, “Data Mining: Introductory and Advanced Topics”, Prentice Hall, 2003, Page 9-10.
 David Boulton and Martyn Hammersley, “Analysis of unstructured data”, Chapter 10, Data Collection and Analysis, Editors: Roger Sapsford, Victor Jupp, SAGE, 2006, ISBN: 9780761943631, Page 243.
 Nathalie Japkowicz, Jerzy Stefanowski, “Big Data Analysis: New Algorithms for a New Society”, Volume 16 of Studies in Big Data, Springer, 2015, ISBN: 9783319269894, Pages 4-10.
 Jiawei Han, Jian Pei, Micheline Kamber, “Data Mining, Southeast Asia Edition”, The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, 2006,ISBN: 9780080475585, Page 47.