By Dr Brian Moore, Prof John Newell and Dr John Kiely
TLDR: Our machine learning model correctly identified in advance 85% of hamstring injuries in elite male footballers with 86% specificity – this is a world-leading breakthrough.
We are pleased to announce the Orreco Motion Data Collective of research scientists and clinicians and share our early research findings. Our group consists of 11 PhD/MDs and high performance practitioners from biostatistics, data science, biomechanics, sports science, physiotherapy, sports medicine, coaching, strength and conditioning and rehabilitation.
Here we explain our rationale and share the early results of our research with specific reference to hamstring injury risk in association football (soccer). Hamstring injury risk was chosen given the frequency and high cost per injury to football clubs. We also have preliminary research underway in basketball and American football.
The rate and cost of injuries
Injuries diminish the likelihood of competitive success, reduce players' market value and shorten, compromise or even end playing careers. Despite significant investments in research and development, staff recruitment, staff training and tracking and monitoring technologies, football injuries reduced by only 3% between 1999 and 2019. Notably, in the first six months of the 2021/22 season, injuries once again increased across each of the five major European leagues costing €279.49m in salaries to unavailable players (1). This rise may be attributable to the truncated seasons, limited off-seasons and fixture congestion caused by the recent pandemic.
Moving the dial
Injuries emerge under the influence of multiple interacting factors. Existing monitoring approaches typically employ isolated measures of load (e.g. GPS) and/or load response (e.g. jump testing or biomarkers, either in isolation or combination) to identify 'risk'. GPS monitoring, for example, captures specific movement variables (distance, speed, acceleration and deceleration) to deploy as surrogate descriptors of externally imposed load. These metrics are widely interpreted as indicating injury risk. However, analysis using appropriate statistical techniques has shown poor predictive ability when using these derived metrics.
The challenge of using machine learning
There is significant interest in the potential of machine learning tools to help identify injury risk in professional sport. Some recent commentary (2-5) has raised concerns about the use of such techniques, specifically uninterpretable black box injury risk prediction algorithms. Equally, a small number of studies suggest that machine learning techniques do have potential in this field (6,7).
We agree completely with the challenges these recent publications have pointed out and which need to be overcome in elite sport – namely relatively small sample sizes, low injury incidence and data overfitting. Uninterpretable black box models without insight into what is driving the signal are of little use. In our opinion, commercial companies reporting predictive ability (i.e. sensitivity) in marketing materials and media reports but omitting the specificity (e.g. false flags) of their risk models are not helpful for high performance practitioners.
In effect, acting on false flags, by introducing remedial actions such as reducing player loads or resting players, may reduce immediate risks. But, by decreasing player conditioning levels, it can also increase longer-term injury risk as players are not exposed to enough load to protect them by building the required resilience.
High levels of false positive rates and, even more troublingly, unreported false positive rates are creating an unhealthy and disruptive decision-making environment for practitioners. Ultimately, when everything is an alarm, the real signal is lost.
Orreco specialises in analysing big data correctly
At Orreco, since our founding, we have been helping our clients in professional sport analyse large scale datasets. We have been actively researching the potential of statistical modelling and machine learning for injury risk and decision-making support for the past ten years.
Five years ago, we accelerated our efforts following a career threatening non-contact injury suffered by an athlete, when all available data were reported as within expected thresholds. Our efforts have been further supported by the growing availability of high fidelity camera outputs from companies including ChryonHego, Second Spectrum, Pixellot and other providers. We also added deep domain expertise in biomechanics to our group.
Our team began investigating the relationships between high fidelity camera outputs, accelerometry, biomarkers, injury incidence and related datasets. As our pilot analysis produced such promising results, we invested heavily in our software architecture, added elite practitioners to the team and sought guidance from academic and clinical partners.
The power of motion signal analysis
Players' movement characteristics are clearly influenced by context – variables such as weather conditions, game priority, the clash of opposing tactics, fixture congestion and score-line can all play a part. Remarkably, however, each player also exhibits highly consistent individual movement repertoires – repetitively deployed sets and sequences of self-similar movement patterns. These habituated patterns reflect player preferences, predispositions, biases, habits and aversions.
Importantly, the core characteristics of these patterns can be captured and statistically modelled to create individual movement signatures that display remarkable structural resilience, remaining consistent within games and across multiple games.
Although players can select from an expansive range of available movement choices to respond to the demands imposed by specific game situations, these choices are heavily influenced by personal biases, idiosyncrasies, predispositions and prior histories developed and engrained over years of training and competition.
In effect, each player, over time, develops a personalised motion signal – a set of habitually recurring movement characteristics and sequences including patterns of runs, turns, accelerations and decelerations executed at varying intensities. These signatures which, although persistently adapted and modified to respond to specific game events, remain remarkably consistent during games and across seasons, typically only deviating outside preferred movement bandwidths following some perturbing event or sequence of events – for example, the accumulation of fatigue, painful sensitisation, psycho-emotional distress and/or concussive events.
Crucially, such deviations reflect a shift in the players' capacity to execute habituated movement patterns. When compelled to operate outside accustomed manoeuvrability bandwidths, players manage loads using unaccustomed movement strategies. This transition, from habituated to unhabituated solutions, imposes unaccustomed mechanical stress, increases movement error, diminishes movement smoothness and drives a creeping cascade of negative events serving to escalate injury likelihood.
In short, deviations outside of habituated movement bandwidths reflect risk.
Results of our research
We have built a predictive model, with interpretable features, using a combination of computationally intensive statistical models and machine learning algorithms. The model correctly identified 85% of hamstring injuries (i.e. sensitivity), when applied to data independent of the model building process, with 86% specificity.
The results show the impressive predictive ability of the model as it provides an immediate and short-term signal for player injury risk. The accurate forecasting of 85% of impending hamstring injuries with such a low incidence of false flag events is a world-leading breakthrough.
We are actively working with three professional franchises to advance this research together. We will be publishing on our research in due course, while understandably maintaining a competitive advantage for our clients. Other non-contact injuries including ACL injury are being actively investigated also.
Our team will be approaching members of the applied science community to expand the collective in due course. If you are a professional franchise and interested in joining our research collective, please email: firstname.lastname@example.org
Motion Data Collective Members
Prof John Newell
Dr Jaynal Abedin
Dr Brian Moore
Dr John Kiely
Dr Daniel Cohen
Dr Andrew Simpkin
Dr Kenny McMillan
Dr Andy Barr
Dr Colm O’Riordan
Pouyan Nejadi MSc
Matt McGrath MSc
Plus Physicians, Scientists and Coaches at our pilot teams.
1. The 2021/22 European Football Injury Index. Mid-season update March 2022. Howden Sports and Entertainment.
2. Claudino et al. Current approaches to the use of artificial intelligence for injury risk assessment and performance prediction in team sports: a systematic review. Sports Medicine 2019.
3. Martin et al. Machine learning in sports medicine: need for improvement. J ISAKOS 2021.
4. Bullock et al. Black box prediction methods in sports medicine deserve a red card for reckless practice: A change of tactics is needed to advance athlete care. Sports Medicine 2022.
5. Van Eetvelde et al. Machine learning methods in sport injury prediction and prevention: a systematic review. J Exp Orthop 2021.
6. Majumdar et al. Machine learning for understanding and predicting injuries in football. Sports Medicine 2022.
7. Nassis et al. A review of machine learning applications in soccer with an emphasis on injury risk. Biology of Sport 2022.