Taking Injury Risk Forecasting Beyond the Black Box

There is growing interest in the role of predictive analytics in sport, where such extensive data collection now provides an exciting opportunity for the development and utilisation of such models for medical and performance purposes.

Injury risk forecasting models have traditionally been developed using regression-based approaches, although machine learning (ML) methods are becoming increasingly popular. In parallel with the increased adoption of machine learning methods, there is also an emergence of proprietary forecasting models that have been developed by researchers with the aim of becoming commercially available.

Research literature has argued that ML methods are often viewed as opaque as the underlying architecture is typically too complex to disentangle all the risk factor-outcome relationships. When ML models are not interpretable by humans or the reasons contributing to injury risk forecasts are not transparent, it has been strongly suggested that ML injury risk forecasting models should be considered negatively as ‘black- boxes’. This argument is exacerbated when the availability of such explanatory information is absent from the implemented software systems providing such forecasts.

A large spectrum of ‘interpretable’ black box machine learning models are already being applied proficiently in many aspects of life specifically when it comes to forecasting the probability of certain events occurring. Examples include those that contribute to areas such as weather forecasting, insurance industry risk management, financial institutions stock performance projections and radiological medical diagnostics.

In this blog, I hope to provide increased confidence that Zone7 is actively operating outside of the opaque ‘Black Box’ categorisation. Our (Zone7’s) engagement process is designed to facilitate transparency and validation via retrospective analysis reports, prospective trial usage and provision of information to aid the user to fully interpret the injury risk forecasts.

In fact, we recently produced a meta case study which we feel goes a long way to answering many of the issues raised by the narrative that ML injury forecasting solutions should be considered negatively as purely ‘black boxes’.

Zone7’s proprietary injury risk forecasting models have been built by experts in data science, machine learning and software engineering but done so in close collaboration with those operating in the sports industry from medical, physiotherapy and sports science disciplines. This has been done to ensure expert knowledge and clinical reasoning has been considered when determining associations between potential causal data patterns and injury occurrences.

Utility and Impact on Athlete Management and Care

As I’ve discussed in previous editions of this blog, forecasting injury risk is probabilistic in nature rather than deterministic so in essence it is an evaluation of injury risk based on what is known but little if anything in professional sport, whether performance or medical related is guaranteed.

Zone7 aspires to stimulate the conversation amongst the multi-disciplinary team (MDT) around player management processes through providing as comprehensive ML data derived insights as possible. The appropriateness for the application of any of these preventative intervention-based insights then of course ultimately remains at the practitioner’s professional discretion.

To sure Zone7’s utility and contribution to a positive impact, we like to assist practitioners to be as data informed as possible rather than being data driven. This includes indicating players level of injury risk, potential injury types and contributing risk factors which are presented to assist the MDT’s thinking on possible intervention and prevention strategies. This is based on causal data pattern association with historical injury events – hence why our large data lake of 200 million hours of training and game data and 10K+ injuries harvested from multiple elite sports teams is so important (a single club cannot leverage enough data alone).

Zone7’s algorithms will also provide suggestions for potential workload modification to minimise athlete injury risk that are also realistic and appropriate for the specific day in the training microcycle and ensure any suggested modification to workload deviates as little as possible from athletes normative values and therefore is not disruptive to the athlete acquiring the training adaptations they need to thrive.

Further interventions such as physiotherapy treatment applications are not suggested by Zone7 as this is out of scope from the ingested datasets. However, through providing the categorised risk alerts, historically associated injury types and information on causative triggers for the increased injury risk alert, professional opinions can be formed as to whether non-workload interventions might be appropriate to apply.

Beyond daily alerts and modification recommendations, Zone7 now also provides the ability to input and explore how future workloads will affect individual athletes’ injury risk status. This we feel hugely increases Zone7’s utility beyond a simple ‘black box’.

Conclusion

We understand the concerns many in the industry have about opaque black box prediction tools, but we are working hard with credible elite expert partners to actively address many of these concerns.

Zone7 continually refines not only it’s algorithmic performance but also its utility and positive impact through close collaboration with all the teams and professionals that use the system. Feedback in any form is gratefully received and sincerely sought for the greater good of athlete performance and welfare.

Through assisting the sports performance and medicine professionals to be as fully data informed as possible, Zone7 aspires to contribute to professional practitioner’s psychological certainty around the daily decisions they make and move the capability of ML technology in this context forward.