Advances in industrial digital technologies have led to an increasing volume of data generated from industrial bioprocesses, which can be utilised within data-driven models (DDM). However, data volume and variability complications make developing models that captures the underlying biological nature of the bioprocesses challenging. In this study, a framework for developing data-driven models of bioprocesses is proposed and evaluated by modelling an industrial bioprocess, which treats industrial or agrifood wastewaters whilst simultaneously generating bioenergy. Six models were developed to predict the reduction in chemical oxygen demand from the wastewater by the bioprocess and statistically evaluated using both testing data (randomly partitioned data from the model development) and unseen data (new data not used during the model development). The statistical error metrics employed were the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). The stacked neural network model was best able to model the bioprocess, having the highest accuracy on the testing data (R2: 0.98; RMSE: 1.29; MAE: 2.27; MAPE: 4.08) and the unseen data (R2: 0.82; RMSE: 2.57; MAE: 1.75; MAPE: 3.68). Data visualisation is used to observe (or confirm) whether new data points are within the model boundaries, helping to increase confidence in the model's predictions on future data. All rights reserved, Elsevier.