ML DoF: Finding Arsenal a Versatile Centre Back with Principal Component Analysis.
Crunching numbers to find who's crunching tackles.
In previous posts, we’ve explored ways to enhance Arsenal’s attack, from clinical strikers to hybrid forwards who can stretch play and dribble first attacking midfielders to break down stubborn deep blocks. Now our focus shifts to the defense.
Arsenal’s 24/25 defensive group was one of the few consistent bright spots in an otherwise volatile season. Despite losing Ben White, Tomiyasu, and Calafiori for long stretches, and Gabriel late on, we still posted elite numbers:
Best xGA (34.4)
+25.5 xGD, second only to Liverpool
A league-best GA of 34 (by 7 goals)
A league-best xGA/90 of 0.9, even with multiple rotations at the back
That kind of consistency doesn’t happen by accident. Gabriel and Saliba remain one of the most complete centre-back pairings Arsenal have ever had. The full-back group, featuring White and Timber on one side, Calafiori and MLS on the other, is arguably deeper than what City or Liverpool can field. Kiwior stepped up with several strong performances as Gabriel's backup and looks to have earned Arteta’s trust after a shaky first year.
But Saliba has now played over 3,000 minutes in consecutive seasons. By May, the signs of fatigue were clear. Mistakes started to creep in, and the sharpness in one-on-one duels dipped. This is where the squad is exposed. There is no like-for-like cover for Saliba, who can step in without a structural reshuffle.
That’s why finding a versatile right-sided centre-back who can also play at right-back is a top priority. Someone who can take the load off both Saliba and White, contribute in build-up, and offer tactical flexibility in a back three. Not a headline signing, but a crucial
Scope & Data
To identify players who fit this profile, I’ve used Principal Component Analysis (PCA) to reduce dimensionality and identify underlying patterns across defensive and possession metrics for centre-backs and right-backs in the top five European leagues.
The approach for this piece is slightly different from previous posts. Rather than building a classification model, PCA allows us to summarise complex data into key components that capture the most variance. That makes it easier to spot players with similar profiles to the hybrid role we're targeting.
The process looked like this:
Building the Dataset
Pulled all players listed as centre-backs and defensive full-backs on FBref who played at least 1350 minutes in the 2024/25 season. This ensured a large enough sample and consistent statistical base.
Finding the Right Metrics
Focused on metrics that reflect the responsibilities of a modern rotation CB/RB: on-ball progression, defensive reliability, and versatility in buildup.
Key stats:
Tackles, interceptions, and blocks (per 90)
Ground and aerial duel win %
Progressive passes and carries
Pass completion % under pressure
Ball recoveries and touches in defensive and middle thirds
Applying PCA
Ran PCA on the standardised feature set to identify key components driving variance in the data to reduce the metric set down to a few principal axes that distinguish player types and styles.
Used a refined feature set based on exploratory data analysis (EDA) to make sure the model identifies the right player profile.
Validating and Interpreting Results
Mapped players in the reduced feature space and clustered them based on proximity. This allowed us to identify defenders with similar underlying traits to Arsenal’s current CB/RB hybrid types, as well as spot potential outliers who bring something different.
Building a Dataset & Selecting Metrics
As part of this project, I built a scoring system to assess players across four key areas: passing, defending, creation, and shooting. This helped segment players based on their playing styles and overall contributions.
For this exercise, the focus is on defenders who can operate in multiple zones rather than traditional centre-backs who sit deep or full-backs who only overlap. Arsenal’s structure relies on defenders who can step into midfield, support buildup, and handle transitional defending in wide areas.
To refine the dataset, I filtered for:
Centre-backs and right-backs with at least 1350 minutes played in the 2024/25 season
Above-average metrics in both defensive actions and ball progression
This narrowed the pool to players who bring composure in buildup, range in positioning, and consistency out of possession. Exactly the kind of profile Arsenal needs to protect against fatigue or injury without compromising tactical fluidity.
Exploratory Data Analysis (EDA)
Before running PCA, I ran a full exploratory analysis on key defensive, passing, and build-up metrics.
These visualisations helped identify patterns in the data and ensured we weren’t feeding noisy, redundant, or irrelevant features into the model.
1. Distribution of Key Metrics
The histograms above show how each feature is distributed across all qualifying defenders. A few takeaways:
Tkl+Int, Tkl%, and Blocks are fairly normally distributed, with most players clustered around the mean. A few elite ball-winners stand out at the far right tail.
Clearances and Sw (Switches attempted) have flatter distributions, skewing slightly left, suggesting most defenders are doing low-volume long-passing or are less involved in aerial clearance duties.
Carries - PrgDist is right-skewed, only a small group of defenders are regularly advancing the ball upfield via carries.
PrgP (Progressive Passes) and Recov (Ball Recoveries) are similar: most defenders show moderate levels of ball progression or recovery activity, but a handful dominate both areas.
Passing accuracy (Total - Cmp%) forms a tight bell curve between 70–90%, suggesting ball security is consistent across profiles.
Attacking metrics like xA, KP, xG, SoT, and SCA90 show sharp left skews. These are low for the majority of defenders, with just a few pushing into territory you'd associate with wide creators or converted midfielders.
These distributions helped flag which stats are high-variance and meaningful, and which are niche or skewed by system roles.
2. Correlation Heatmap Analysis
The heatmap highlights relationships between the features and reveals potential redundancies:
Carries - PrgDist and Recov show a strong positive correlation (0.83), pointing to defenders who carry the ball also tending to recover it frequently, likely a marker of defensive midfield traits or aggressive wide CBs.
PrgP (Progressive Passes) correlates well with Carries - PrgDist (0.56) and moderately with Recov (Recoveries) (0.55), forming a cluster of progression-heavy traits.
xA, KP, and SCA90 are all tightly correlated (above 0.82), meaning any one of them could act as a proxy for creation. I kept all three for nuance.
Blocks show inverse correlation with creation and progression stats (up to -0.6 with SCA90 and KP), which reflects the trade-off: players who spend more time in deep defending zones are less involved in creative phases.
Tkl+Int and Tkl% barely correlate with progression or attacking metrics. This independence suggests you can include them in the PCA without risking noise from offensive contributions.
Clr (Clearances) shows a strong negative correlation with the creation cluster (xA, KP), reinforcing that it reflects a more reactive, clear-first profile, very different from the ball-playing types.
This helped group metrics into defensive, ball-progression, and attacking categories and supported the choice of a PCA-based model instead of a classifier.
PCA can blend these clusters naturally without flattening them into binary outputs.
3. Pair Plot Analysis
The pairplot gives a granular view of how these relationships behave across the dataset. A few things stood out:
xA, KP, and SCA90 show a tight, almost linear correlation band, ideal for PCA, since we know these are different names for the same creative impact.
PrgP and Carries - PrgDist also cluster tightly, confirming that defenders with one form of progression often have the other.
xA and xG are loosely correlated, suggesting that some defenders create chances but don’t shoot, and vice versa. This helps differentiate between overlapping full-backs and inverted ones.
Tkl% and most other metrics show diffuse clouds, which is fine. Tackle success rate is often system-dependent and doesn’t follow a strong linear trend with raw defensive volume.
Together, the histogram, heatmap, and pairplot confirmed the diversity of defender types in the data and validated the use of PCA to compress these profiles into a readable 2D space without discarding key traits.
This visualization confirms that the dataset captures a range of playmaking styles, reinforcing the need for feature selection that balances dribbling impact with creative output.
Feature Selection for the Neural Network Model
After this analysis, I settled on a feature set that reflects the full spectrum of a modern hybrid defender’s responsibilities. This set was standardised and passed into PCA:
Defensive Work
Tkl%
Tkl+Int
Blocks
Clr (Clearances)
Ball Retention & Recovery
Total - Cmp%
Rec (Recoveries)
Progression
PrgP (Progressive Passes)
Carries - PrgDist
Final Third and Attacking Output
xA (Expected Assists)
xG (Expected Goals)
KP (Key Passes)
SoT (Shots on Target)
SCA90 (Shot-Creating Actions per 90)
This blend captures everything from pure defensive output to on-ball progression and occasional attacking contributions, without biasing the model toward attacking full-backs or low-touch centre-backs.
The feature scaling and PCA transformation then reduced this space into two components, which were used for visualisation, quadrant analysis, and clustering. The next step is mapping individual players within that space and surfacing candidates for Arsenal’s versatile CB/RB role.
How the PCA Model Works (PCA1)
This first PCA model (PCA1) is designed to cut through noise and surface patterns across a large and positionally messy dataset. Thousands of players are listed as defenders, many of whom do not play anything close to the role Arsenal are targeting. The aim here is not to pick players, but to isolate profiles: who behaves like a centre back, who overlaps like a full back, and who mixes both.
Technical Overview
Principal Component Analysis (PCA) is a linear dimensionality reduction technique that identifies new axes (called principal components) that capture the most variance in the data. Instead of treating each metric as a separate, isolated feature, PCA transforms them into weighted combinations that summarise how players vary across all attributes.
Here’s how it works, step by step:
Standardisation: All input features are first scaled so that they have a mean of zero and standard deviation of one. This ensures that metrics like tackles and progressive passes carry equal weight, regardless of their original scale.
Covariance Matrix Construction: A covariance matrix is calculated to identify how metrics vary together across the dataset. For example, if recoveries and carries always rise and fall together, their covariance will be high.
Eigen Decomposition: PCA then solves for the eigenvalues and eigenvectors of this covariance matrix. Each eigenvector represents a principal component, a direction in feature space, while each eigenvalue tells you how much variance is captured along that axis.
Projection: The data is then projected onto the first two principal components. This step compresses high dimensional data into two dimensions while retaining as much variance as possible.
The end result is a 2D representation of complex player behaviour. Players who cluster together in PCA space behave similarly across multiple features, even if their raw numbers look different in isolation.
In much simpler terms:
PCA (Principal Component Analysis) helps us see the big picture of player styles by reducing complex stats into simple patterns. First, it standardises different metrics. Then it uses a covariance matrix to show how stats move together, such as tackles rising with interceptions. From this, it extracts eigenvectors, which represent the key combinations of traits that define player styles, like "box-to-box engine" or "poacher." These are then used to project players onto a 2D map, where similar profiles cluster together. Instead of juggling dozens of stats, PCA reveals the few big themes that truly define how players operate.
Why PCA1?
This first PCA is intentionally broad. It prioritises coverage over granularity. The goal is to cluster all defenders, CBs, FBs, Wing-Backs, hybrids, and other odd fits. Then it’s possible to start to make sense of what is quite a messy dataset. K-Means clustering is applied after PCA to identify segments with shared characteristics
This gives us a strategic overview of the defender landscape, ie, a heat map of roles.
What Comes Next (PCA2)
Once PCA1 isolates clusters of interest, particularly those containing players with the defensive metrics of a centre back and the progression metrics of a full back, we can zoom in. PCA2 will run on a much smaller, hand-picked dataset that includes only those relevant profiles.
The feature set will also expand. PCA2 will include more granular technical and stylistic data. Further segmentation will be applied and then plotted on a chart where a quadrant shading is used to map general archetypes:
High progression, high defending
High defending, low progression
High progression, low defending
Low across both
The goal is to filter for Arsenal’s tactical blueprint, not just general competence.
PCA1 is about segmentation. PCA2 is about selection.
Segmenting Defender Playstyles Using K-Means Clustering
To refine the shortlist beyond just raw PCA outputs, K-Means clustering was applied to segment defenders based on playstyle rather than nominal position. This allows the analysis to cut across traditional labels and instead focus on what players actually do on the pitch.
Why Clustering Matters
Labels like CB or RB don't mean much in isolation. Plenty of players blur the line between roles. Calafiori drifts into midfield, Timber builds from deep, Hakimi defends like a winger, and Akanji ends up as a holding midfielder in possession. Clustering helps group players by style, not by squad list job titles.
By clustering on a mix of defensive output, progression metrics, and involvement in build-up, we can isolate player types:
Classic stoppers
Ball carriers from deep
Overlapping wide defenders
Passive defenders in high possession systems
Vertical passers from the back line
This helps us target profiles that can rotate into Arsenal’s right centre-back and right-back zones without disrupting shape or buildup flow.
Cluster Breakdown (PCA1)
As expected, below you see the initial K-Means plot, and it’s chaotic. As previously stated, the point of PCA1 is not to generate a shortlist but to visualise patterns in a large, positionally ambiguous pool.
Looking at the PCA + KMeans output, several clusters stand out:
Cluster 0 (dark red):
Physically dominant, defensively active centre backs. This group includes players like Tosin Adarabioyo, Micky van de Ven, Daniel Vivian, and Manuel Akanji. These are duel-heavy defenders who thrive in deeper or more transitional systems. They defend with volume and intensity but offer less in-possession progression.Cluster 1 (teal)
Mobile, modern full backs who can slot into both back four and back five systems. Names like Conor Bradley, Diogo Dalot, Pervis Estupiñán, and Rayan Aït-Nouri pop up here. These players blend ball-carrying with decent defensive output. Flexible and tactically useful, though not always standout in high-possession sides.Cluster 3 (blue):
Progressive, quarterback-style centre backs. Think Nathan Aké, Joachim Andersen, Yeray Álvarez, and Igor Zubeldia. These defenders are calm under pressure, pass well from deep, and contribute meaningfully to buildup through both switches and carries. Less aggressive defensively, but ideal for possession-based systems looking for backline distributors.Cluster 4 (cyan)
High-output attacking full-backs. Trent Alexander-Arnold, Grimaldo, Hakimi, Theo Hernández, and Nuno Mendes headline this group. Strong progressive and creative numbers, but often come with tradeoffs in defensive stability and shape management.Cluster 5 (orange)
Safe, low-variance defenders. Players like Ola Aina, Kristoffer Ajer, and Kelvin Amian who offer solid baseline metrics without excelling in either direction. Reliable but not system-defining.Cluster 6 (pink)
Young, aggressive, attacking wide defenders. Alejandro Balde, Raoul Bellanova, Javi Galán, and Leif Davis show up here. Often post inflated xA and carry stats, likely due to system context or sample size. Similar to Cluster 4 but more raw and volatile.
Below you’ll find a snippet of each cluster’s table.







Clusters 0 and 3 offer different flavours of centre-back rotation options. Cluster 0 contains physically assertive stoppers who defend space and win duels. Cluster 3 includes technically assured, on-ball centre-backs who can start attacks and manage tempo under pressure. Both are relevant depending on the tactical context of the rotation.
Cluster 1 offers the kind of flexible wide defender who can operate in either a back three or four. Useful depth profiles, but not ideal for anchoring possession buildup.
Clusters 4 and 6 are high-variance creators from wide areas. Fun on the ball, risky out of it. Arsenal’s current setup likely needs more control and structure than these profiles typically offer.
For this project, I’m focusing on Cluster 3. With 144 players, it’s the most populated group, but that makes sense, this is where many of the game’s most composed, technically secure defenders live. For what Arsenal need, calmness under pressure and the ability to play a variety of passes out of the back matters far more than brute strength or high defensive volume. I envisage the selected player to be rotated into matches where the team will have more of the ball and less defending to do. In those situations, control beats chaos.
It’s also worth noting that the overlap in play styles across roles means clustering based on WCSS and the elbow method only goes so far. Defenders blur lines constantly, some centre-backs carry like full-backs, some full-backs defend like sweepers, and that ambiguity shows in the results.
Cluster Breakdown (PCA2)
This second PCA is far more targeted. The dataset has been filtered down to players who passed the eye test from PCA1, defenders who showed meaningful technical involvement, composure in possession, and progressive output. The aim here is no longer just segmentation. It's surgical. Who can play under pressure, who can ping a diagonal or switch play, and who can step into a system like Arsenal's without the wheels falling off?
Again, K-Means clustering was applied post-PCA, and while the clustering process still has its limitations, overlapping styles, noisy metrics, and subjective feature weighting, the result is a cleaner spread of defender types. The silhouette scores were stronger than PCA1, and the clusters better reflect meaningful stylistic boundaries.
Cluster 0:
Stoppers. Big numbers in clearances, blocks, and duels, but low contribution in buildup compared to the rest of the clusters. Not the best in terms of stylistical alignment for Arsenal’s needs.
Cluster 1:
Active, all-round defenders. Solid metrics across the board, these players are reliable in duels and reasonably confident in buildup, without standing out in any single area.
Cluster 2:
Top-end defensive output with strong ball-playing. This group is small but includes some of the most complete defensive profiles. A lot of promise for encouraging Full-back shifts as well. Names in here would want to be starters of a Champions League team.
Cluster 3:
Transitional defenders. These players are often seen in mid-table or pressing systems. They're capable of progression but less clean in tight spaces.
Cluster 4:
Composed, high-touch defenders. Smaller group, but the profiles here show strong on-ball security and positional discipline. Useful in high-possession systems.
Cluster 5:
Calm, progressive centre backs. This is the cluster I’m focused on. It includes defenders who not only carry the ball but can shift it intelligently, maintain shape, and manage the tempo from deep areas. With 23 players in this group, it offers a mix of experience, youth, and versatility.
Cluster 6:
Balanced profiles. These defenders show a strong mix of physical presence, defensive output, and enough ball-playing ability to hold their own in possession. They're not elite passers or high-volume progressors, but they offer solid defensive foundations with competent distribution and carrying. This is the kind of group you look at if you want someone who can hold up in both high duels and moderate build-up environments. Most would be comfortable rotating into a system like Arsenal’s without being a pure stylistic match.
The snippet of each cluster can be seen below:







PCA2: Mapping Defender Profiles with Expanded Data
After using clustering in PCA2 to identify more granular profile groups of defenders from my selected dataset from PCA1, this next step zooms in with a much richer feature set.
Using a wider range of metrics, covering passing, ball progression, pressures, defensive actions, and more, this model provides a 2D map of defender styles across Europe. Each point represents a player, plotted by their score along the first two principal components. These components capture the underlying variance in playing style across all selected metrics.
Understanding the Chart
Standardises the data (so all features contribute equally)
Reduces dimensionality via PCA (to the top 2 principal components)
Plots each player on this new 2D map of defender behaviour
Shades the space into four quadrants based on custom thresholds:
🟩 Top-right (Green): Elite Progression + Defensive Output
These are the unicorns: players who carry or pass forward and defend well. Ideal profiles for a high-possession, high-line team like Arsenal.🟦 Top-left (Blue): Defensive Volume, Low Progression
Traditional, deeper-line defenders. Strong in duels, blocks, and recoveries, but limited on the ball.🟥 Bottom-left (Red): Low Progression + Low Defensive Output
Profiles with minimal output in both directions, likely poor fits for elite-level systems.🟨 Bottom-right (Gold): Ball-Players, Light Defenders
Stylish defenders who offer progression via passing or carrying but may lack volume in duels or positional discipline.
Each quadrant gives a clearer sense of role fit. The green and gold zones highlight players suited to Arsenal’s ball-dominant system, while the blue zone surfaces options better suited to low-block or emergency defending.
PCA1 helped narrow things down, but it couldn’t fully distinguish between good passers and true quarterback centre-backs. PCA2 adds more context by combining broader passing and defensive stats, helping surface profiles that were blurred or misclassified earlier.
Clustering methods like the elbow test are useful but imperfect, especially when playstyles overlap. This second PCA helps correct for that.
I used it to trim the 144 player set down to 79, focusing on those in the blue, green, and gold areas who show composure under pressure, ball progression range, and stylistic fit for Arsenal.
Analysis of Playmaking Centre-Backs
These six scatter plots break down passing progression, accuracy, creation, and defensive performance. They’re designed to help surface the most Arsenal-aligned profiles, defenders who can play under pressure, progress possession, and still hold their own when defending transitions.
1. Progressive Creators
(PrgP vs. xA)
This chart maps progressive passes against expected assists. It isolates true deep-lying creators from defenders who simply rack up volume.
Castello Lukeba and Fikayo Tomori are the standout names, offering both forward progression and creative risk.
Pape Abou Cissé and Murillo also impressive, aggressive with the ball and not afraid to play into the final third.
Robin Le Normand quietly pops here too, with strong creative returns from deeper zones.
2. Line-Breakers
(PPA vs. SCA90)
This one focuses on box-entry passes and total shot creation, a proxy for line-breaking intent and final-third involvement.
Tomori, Murillo, and Lukeba again show up strong, but Konstantinos Koulierakis deserves attention too, tidy on the ball but still ambitious.
3. Receivers Who Progress
(Rec vs. PrgR)
Here we track defenders who receive the ball often and then drive it forward, not just passively recycling but actively initiating play.
Jake O'Brien stands out as a progression-first ball receiver, pushing play upfield consistently.
Lukeba again looks comfortable, showing two-way involvement.
4. Accurate and Creative
Pass % vs. xA)
A clean filter for defenders who can create without losing the ball. It rewards those who deliver quality with consistency.
Tomori leads the way here, elite xA and above-average accuracy.
Koulierakis, Normand, and Murillo all offer above-median creativity at solid clip rates.
Pacho quietly posts excellent pass completion numbers with modest creative output, tidy, if not flashy.
5. Progressive Passers
(PrgP vs. Pass %)
This view shows how press-resistant and productive a defender is with the ball. The top-right names are your steady high-risk, high-reward passers.
Malick Thiaw, Tomori, Yoro, and Torres headline this quadrant, all confident in threading lines under pressure.
Mosquera and Sarr also post strong numbers without letting efficiency dip.
6. Dual Threats
(Defending Score vs. Passing Score)
The final visual sums it up, who balances ball security with strong defensive fundamentals?
Tomori, Pau Torres, Thiaw, and Zubeldia are the elite picks, reliable at the back, progressive with the ball.
Omeragic and Pacho come just under that top tier, blending tidy passing with good positional defending.
Mosquera and Murillo round out a second tier, solid all-rounders with some flair.
To round things out, the charts below act as a gallery of deeper passing and progression traits. Whether it’s verticality, long-range accuracy, touch volume, or dribble-driven penetration, these visuals let you spot unique strengths and tradeoffs across the pool.
Players like Thiaw, Pacho ,Tomori, Zubeldia, Mosquera and Lukeba keep popping up, suggesting a blend of repeatable passing value and structural fit.




To segment the final shortlist, I sorted all centre-backs by a blended metric of Passing Score and Defending Score. From there, I took the top 20 performers. This gave me a concentrated group of hybrid profiles who combine composure in possession with reliable defensive output.
The Shortlist:
1. Stanley N'Soki (TSG Hoffenheim)
Strengths: Good pace for a center-back, comfortable on the ball, decent passing range, can play left-back.
Weaknesses: Can be prone to lapses in concentration, sometimes lacks physical dominance in duels.
Fit: Suits a team that likes to build from the back and values a defender with good recovery pace.
2. Leny Yoro (Manchester United)
Strengths: Exceptional talent for his age, calm and composed on the ball, excellent aerial ability, strong positional sense. High ceiling.
Weaknesses: Still very young and raw, lacks significant top-level experience, physically still developing.
Fit: An ideal long-term investment for a top club looking for a future elite center-back. Needs patient integration.
3. Igor Zubeldia (Real Sociedad)
Strengths: Very strong defensively, good tackler, excellent positional awareness, calm under pressure.
Weaknesses: Not the quickest, can struggle against very agile attackers, limited in offensive contribution.
Fit: A reliable, no-nonsense defender for a team that prioritizes defensive solidity and organization.
4. Pau Torres (Aston Villa)
Strengths: Elegant ball-playing defender, excellent passing from the back, good aerial ability, strong left foot.
Weaknesses: Can be caught out by pace, sometimes lacks aggression in physical duels.
Fit: Perfect for a possession-based team that wants a defender who can initiate attacks and play precise long balls.
5. Fikayo Tomori (AC Milan)
Strengths: Blazing pace, aggressive tackling, strong recovery runs, good leadership qualities, comfortable stepping out.
Weaknesses: Can be overly aggressive leading to fouls, sometimes prone to positional errors.
Fit: Excellent for a high-line defense that needs quick defenders to cover space behind them, or for teams that counter-press aggressively.
6. Malick Thiaw (AC Milan)
Strengths: Tall, strong in aerial duels, good physical presence, decent pace for his size, composed on the ball.
Weaknesses: Still developing consistency, can be caught out by intricate attacking movements.
Fit: A good option for a team looking for a physically imposing center-back who is also comfortable in possession.
7. Malang Sarr (Lens)
Strengths: Good athleticism, comfortable on the ball, versatile (can play left-back).
Weaknesses: Lacks consistent game time, prone to errors, often struggles with decision-making.
Fit: A risky option; needs a fresh start and consistent minutes to rebuild confidence and form.
8. Saidou Sow (Strasbourg)
Strengths: Strong physically, good in the air, tenacious tackler, decent pace.
Weaknesses: Can be impulsive, needs to improve his passing range and composure on the ball.
Fit: Suits a team that values aggressive, physical defending and can tolerate some rawness in possession.
9. Willian Pacho (Paris Saint-Germain)
Strengths: Very good physical attributes, strong in the air, aggressive in duels, quick to cover.
Weaknesses: Still developing his passing game, can be prone to tactical fouls.
Fit: A solid, no-nonsense defender for a team that values physicality and defensive aggression.
10. Becir Omeragic (Montpellier Hérault SC)
Strengths: Good ball-playing abilities, comfortable under pressure, decent passing range, versatile (can play defensive midfield).
Weaknesses: Lacks elite pace, can be physically overwhelmed by stronger strikers.
Fit: A good option for a team that wants a defender who can contribute to build-up play.
11. Gabriel Osho (AJ Auxerre)
Strengths: Athletic and physically strong, good in aerial duels, determined, good work rate.
Weaknesses: Can be prone to tactical errors, passing range is limited, needs to improve consistency at a higher level.
Fit: A battler for a team that values physicality and effort, perhaps best suited as a depth option or for a system relying on defensive grit.
12. Aitor Paredes (Athletic Club)
Strengths: Strong tackler, good positional awareness, aggressive in duels, good leadership potential.
Weaknesses: Not the quickest, limited in attacking contributions.
Fit: A reliable, traditional center-back for a team that prioritizes defensive stability.
13. Youssouf Ndayishimiye (OGC Nice)
Strengths: Versatile (can play CDM), excellent physicality, strong tackling, good in the air, high work rate.
Weaknesses: Can be overly aggressive, sometimes prone to errors when pressed, passing can be inconsistent.
Fit: A good option for a team that wants a tough, versatile defensive player who can cover multiple positions.
14. Abdoulaye Niakhate Ndiaye (Stade Brestois 29)
Strengths: Tall and physically imposing, good in aerial duels, decent pace for his size, promising potential.
Weaknesses: Still raw, needs to improve his technical skills and decision-making.
Fit: A long-term project for a team willing to develop a physically dominant defender.
15. Mickael Nade (AS Saint-Étienne)
Strengths: Strong physically, good aerial presence, no-nonsense defender.
Weaknesses: Lacks pace, limited technical ability, can be caught out by quick attackers.
Fit: A traditional, rugged defender, best suited for a team that doesn't prioritize playing out from the back.
16. Cristhian Mosquera (Valencia CF)
Strengths: Athletic defender, good pace, comfortable on the ball, promising potential.
Weaknesses: Young and inexperienced, can be prone to positional errors, needs to develop physically.
Fit: A good prospect for a team looking to invest in a young, athletic center-back.
17. Murillo (Nottingham Forest F.C.)
Strengths: Strong and aggressive, good tackler, decent aerial ability, brings a competitive edge.
Weaknesses: Can be prone to impulsive decisions, sometimes lacks composure on the ball.
Fit: A combative defender for a team that needs a strong presence at the back and values defensive aggression.
18. Abdul Mumin (Rayo Vallecano)
Strengths: Very athletic, good pace, strong in duels, decent recovery speed.
Weaknesses: Can be inconsistent with his passing, sometimes prone to lapses in concentration.
Fit: A good option for a team that plays a high line and needs quick defenders to track back.
19. Mark McKenzie (Toulouse FC)
Strengths: Good passing range, comfortable on the ball, decent pace, good leadership qualities.
Weaknesses: Can struggle physically against stronger strikers, sometimes prone to errors under pressure.
Fit: Suits a team that wants a ball-playing center-back who can initiate attacks.
20. Chrislain Matsima (FC Augsburg)
Strengths: Athletic, good recovery pace, comfortable on the ball, promising potential.
Weaknesses: Still developing, can be inconsistent, lacks experience at the top level.
Fit: A developmental prospect for a team looking for an athletic, modern center-back.
21. Jon Martin (Real Sociedad)
Strengths: Good technical ability, comfortable on the ball, good passing range, strong leadership potential.
Weaknesses: Physically still developing, needs to adapt to higher intensity.
Fit: A long-term prospect for a team willing to develop a technically sound, ball-playing defender.
22. Castello Lukeba (RB Leipzig)
Strengths: Excellent pace, very good on the ball, strong tackler, composed under pressure, strong progressive passing.
Weaknesses: Can sometimes be caught out by aerial threats, relatively new to top-tier European football.
Fit: Ideal for a modern, high-pressing team that requires quick, ball-playing defenders comfortable in a high line.
23. Jake O'Brien (Everton FC)
Strengths: Tall (6'6") and physically imposing, strong in aerial duels, surprising pace for his size, composed on the ball, can contribute goals from set pieces.
Weaknesses: Still developing his consistency in defensive decision-making in the Premier League, can be overly aggressive at times.
Fit: A strong option for a team needing a dominant aerial presence and a defender capable of playing in a higher line. His physical attributes and comfort on the ball make him a good fit for a modern, athletic defense.
Assessing Impact with a Random Forest Classifier
This the Ball-Playing CB Model evaluates centre-backs using a custom Improvement Index (0–10 scale) that blends progressive passing, carrying into space, duel success, press resistance, and build-up reliability.
Each player falls into one of three buckets:
Ceiling Raisers – Big impact players who could elevate the system but are likely too expensive, too established, or not plausible backups.
Floor Raisers – Solid, tactically reliable options who improve the baseline quality without necessarily needing to start.
Stabilisers – Depth pieces that bring balance, composure, or long-term upside.
Ceiling Raisers:
Fikayo Tomori (10.0), Leny Yoro (9.5), and Castello Lukeba (9.1) top the list with elite profiles. All three bring high-end ball security and proactive defending, but realistically, they’re starters at top clubs or headed there.
Great players, wrong context, they wouldn’t come to play second fiddle.
Floor Raisers:
Stanley N'Soki (9.0), Youssouf Ndayishimiye (8.8), and Malang Sarr (8.2) offer consistent upside and profile versatility. These are high-floor options that would add value when rotated in.
They fit a brief where tactical compliance matters more than star power.
Stabilisers:
The majority of names fall here. Jake O'Brien (7.4), Becir Omeragic (7.2), Murillo (6.7) and Zubeldia (6.4) are good examples, players who won’t dominate headlines but tick the right boxes in passing security, aerial competence, and comfort under pressure.
Several names with a score of 5.0 or higher show enough to justify inclusion in a squad playing a high-line, ball-dominant system.
What Arsenal Should Be Targeting
Given Saliba and Gabriel are elite already, the goal isn’t to replace them but to find someone who can come in without causing a drop-off in structure or technical assurance. That makes stabilisers with scores above 5.0 the real sweet spot. These are players that are good enough to step in, not so good they’d expect to start every week.
Recommendations & Ruminations
Fikayo Tomori
Always feels like one of those defenders who got out at the right time. England never really gave him the trust he deserved, and he’s gone and built a career in Milan instead. Quick, strong, aggressive front-foot defender, but his on-ball game has come on loads. Loves a punchy progressive pass through the lines, and isn’t afraid to carry when the game state allows it. He’s got that modern blend of athleticism and calmness that top teams demand. Feels like a player who’s grown up away from the English media echo chamber, more polished, more serious. Would slot in seamlessly as a right-sided rotation piece for Saliba. Might be expensive and in a World Cup year, any Premier League move would necessitate far more minutes than we can offer.
Castello Lukeba
If you’re looking for a left-footer who can carry, pass, and play with composure under pressure, this is your guy. Lukeba doesn’t always make highlight reels, but his ability to calmly beat pressure and break lines is incredibly rare at his age. He’s still a bit raw positionally, but the upside is massive. Physically agile, technically smooth, and tactically versatile. Feels like one of those players who, if you don’t buy him now, you’re paying twice the price in two years.
Not quite a star, but has the tools to become one with the right coaching. A bit like Gabriel but with more finesse on the ball. However, he would command a hefty fee, and I think we’d have to sell either Kiwior or Calafiori, and I don’t think he’s a better player than both right now. All in all still a wonderful footballer.
Youssouf Ndayishimiye
A midfielder-turned-centre-back hybrid with the footwork to match. You watch him play, and there’s a looseness to his body shape and passing angles that just screams former DM. Brave in tight spaces, crisp off both feet, and tactically intelligent enough to cover wide zones. Not the tallest, but wins his fair share in duels thanks to timing and positioning. Could be a fascinating depth option who can step into midfield zones in build-up and keep things ticking. Probably more of a left-field shout, but fits the Zinchenko-era blueprint of multifunctional defenders with ball control and press resistance, from the right side. Low risk, high upside.
Stanley N’Soki
Still feels like a hidden gem. Strong left foot, very decent in duels, and not shy about threading passes into midfield. Has that classic Ligue 1-to-Europe profile: under-scouted, under-hyped, and probably underpriced. The main appeal is how comfortable he looks stepping out of the back line, whether to carry or to bait the press and play through it. Probably not someone who walks into Arsenal’s XI, but could be an excellent squad option, especially in a 2+3 build-up structure where his range and left-footed balance are useful. You’re not buying fireworks, you’re buying calm and competence. Again, I don't think we need another left footer, shame really, because he’s a real player in my opinion.
Igor Zubeldia
Mr. Composure. The kind of defender who never looks rushed, even when he probably should be. A holding midfielder by trade, now anchoring the back line for La Real. It shows in the way he plays, always scanning, always aware of the next pass. Rarely smashes it long when a disguised ball into the pivot is on. Not the most aerially dominant, but very positionally sound and rarely caught diving in. Feels like a more economical, Spanish Ben White, calm under pressure, tactically reliable, and very easy to build around. Slightly older at 28, and I do have some questions around adaptation to the PL. A hard maybe from me.
Cristhian Mosquera
Big frame, ambipedal, lovely weight of pass, and shows serious confidence stepping into midfield spaces. Defensively, still learning the ropes, positioning can wobble, and he doesn’t dominate in duels yet, and quite weak aerially for a big guy, but the tools are so clearly there. A centre-back who plays like he’s wearing the No. 6 shirt.
I’m posting this piece just a week or so before Arsenal are set to announce him, which makes this all the more satisfying. If you want to future-proof the back line and build a long-term replacement or partner for Saliba/Gabriel, Mosquera ticks the lot. High ceiling, reasonable cost, fits the age curve. Chapeau Bert-teta.
William Pacho
Clean, technical, and just weirdly under the radar given how polished he already looks. Very solid in possession, rarely gives it away, and can play sharp line-breaking passes with minimal wind-up. Not the flashiest, not the most aggressive, but ticks a lot of boxes. Left-footed, consistent, good recovery speed, and stylistically fits a side that wants to dominate the ball. He’s already shown he can cope in a structured build-up system, which is half the battle in Arteta’s shape. Feels like someone who won’t make headlines but will make a manager sleep better. Steady as they come.
Conclusion and Next Steps
This shortlist was built using a two-stage PCA modelling approach. First to reduce dimensional noise and segment players by stylistic fit, then to fine-tune selection using a second round of principal component analysis focused on progression, passing quality, and defensive output. It wasn’t about overfitting to statistical anomalies or outliers. It was about distilling the chaos of raw numbers into clearer role-based zones, then trimming intelligently using visual segmentation and percentile cutoffs.
The original pool of 144 players was reduced to 79 by filtering through the blue, green, and gold quadrants. These were players offering defensive volume, composure under pressure, and progressive quality in possession. From there, I ranked profiles by passing and defensive scores to isolate a final 25-man cut.
If the brief is experience and tactical plug and play, then Tomori, Ndayishimiye, and Zubeldia stand out. All three are high-usage passers with good defensive metrics and the kind of hybrid skillsets Arsenal’s system demands. They can hold a line, break pressure, and build.
If the preference is for a longer-term bet, then Cristhian Mosquera is the pick. He’s still developing, but tactically mature beyond his age and technically secure. I finalised this model back in May, just as early links to Mosquera were surfacing. Now that a deal seems close, I’m genuinely pleased. He fits the criteria as well as anyone and could grow into a top-level solution at either centre-back slot.
The PCA approach isn’t perfect. Real players blur across role types and final judgement should always blend data with tactical context. But for narrowing the field with clarity and logic, it worked. The names that rose to the top did so for good reason.
It was also validating to see how many of them came from similar talent pipelines. A significant number were Clairefontaine graduates – players like N'Soki, Yoro, Sarr, Nade, Matsima, and Lukeba. The modelling captured clusters of development style and on-ball traits, even across leagues.
Still, one weakness in the process was footedness. My scraping method doesn’t reliably pick it up, and the shortlist ended up heavier on left-footers than intended. For a rotation role covering the right side, that’s a gap I’ll need to solve going forward.
Final Comments
Originally, I planned to follow this up with three more posts: one using cosine similarity to find a 6/8 hybrid midfielder, one using anomaly detection to identify a backup goalkeeper, and another using K-Means clustering to shortlist left-backs. But with Arsenal now all but confirming the arrivals of Martin Zubimendi, Christian Nørgaard, and Kepa Arrizabalaga, building fresh identification models for positions that are already filled feels redundant. In addition, I don’t think a left-back is an important position to speak about right now.
I’ve already written a breakdown of Christian Nørgaard’s fit, which you can read here, and there’s also a piece exploring how Zubimendi and Rice might work together here. I’ll follow up with a GK profile on Kepa in the next few weeks.
This whole process has been a rewarding one. From building the datasets to visualising the data and testing different modelling approaches, it's been a genuinely enjoyable exercise. Hopefully, you found it both insightful and engaging, and maybe even a little useful if you're the sort of person who spends your spare time stress-testing transfer links.
Thanks for reading,
Steve
Great job, Steve 👏