Linear Optimized NBA Team

Our goal is to use Linear Programming to determine the most optimal NBA Team for the upcoming ‘22-’23 NBA Season. We have gathered our data from www.basketball-reference.com and are trying to maximize the Player Efficiency Rating (PER) of the whole team. The most optimal team will need to stay under the ‘22-’23 season salary cap of $123,655,000, and have 15 players, with 3 players per position (PG, SG, SF, PF, C).

Data Sources:
Contracts: www.basketball-reference.com/contracts/players.html
Advanced Stats: www.basketball-reference.com/leagues/NBA_2022_advanced.html
Normal Stats: www.basketball-reference.com/leagues/NBA_2022_totals.html

NBA_Team_Linear_Optimization

Data¶

import os 
from ortools.linear_solver import pywraplp
import pandas as pd
import numpy as np

df_PER = pd.read_csv("/Users/ethan/Downloads/data_sets/NBA_LP_PER.csv", sep=";")
df_PTS = pd.read_csv("/Users/ethan/Downloads/data_sets/NBA_LP_PTS.csv", sep=";")
df_salary = pd.read_csv("/Users/ethan/Downloads/data_sets/NBA_LP_Salary.csv", sep=";")

#Filtering the relevant columns
df_PER = df_PER[["Player","Pos","Tm","PER","WS/48","VORP"]]
df_PTS = df_PTS[["Player","Age","Tm","MP","PTS Gen"]]
df_salary = df_salary[["Player","Tm","Salary"]]

# Merging the dataframes
df=pd.merge(pd.merge(df_PTS,df_PER,on='Player'),df_salary,on='Player')
df = df.drop_duplicates("Player")
df = df[(df["VORP"] >= 0) & (df["MP"] >= 10)& (df["PTS Gen"] >= 100)]
df = df[["Player","Tm","Pos","Age","PER","MP","WS/48","VORP","PTS Gen","Salary"]]

df

	Player	Tm	Pos	Age	PER	MP	WS/48	VORP	PTS Gen	Salary
1	Steven Adams	MEM	C	28	17.6	1999	0.163	2.0	784.0	17926829
2	Bam Adebayo	MIA	C	24	21.8	1825	0.188	2.7	1258.0	30351780
7	Grayson Allen	MIL	SG	26	12.7	1805	0.110	1.1	833.0	8500000
8	Jarrett Allen	CLE	C	23	23.0	1809	0.225	2.7	996.0	20000000
9	Jose Alvarado	NOP	PG	23	16.4	834	0.121	0.8	482.0	1563518
...	...	...	...	...	...	...	...	...	...	...
509	Christian Wood	DAL	C	26	19.1	2094	0.112	1.9	1373.0	14317459
510	Delon Wright	WAS	SG	29	13.8	1452	0.121	1.6	530.0	7804878
512	Thaddeus Young	TOR	PF	33	17.0	845	0.126	0.9	426.0	8000000
515	Trae Young	ATL	PG	23	25.4	2652	0.181	4.8	2892.0	37096500
516	Omer Yurtseven	MIA	C	23	17.4	706	0.145	0.2	348.0	1752638

274 rows × 10 columns

df["idx"] = df.index
df["Pos"] = df["Pos"].str.slice(stop=2)
df["Pos"] = df["Pos"].str.replace('-','')

# players_dict has information about the players, with a key indicating the index in df
# positions_dict has information on each player's position
players_dict = {}
positions_dict = {}

for index, row in df.iterrows():
    players_dict[row["idx"]] = {
        "name": row["Player"],
        "Pos": row["Pos"],
        "MP": row["MP"],        
        "PER": row["PER"],
        "Age": row["Age"],
        "WS": row["WS/48"],
        "VORP": row["VORP"],
        "PTS Gen": row["PTS Gen"],
        "Salary": row["Salary"]
    }
    pos = row["Pos"]
    if pos not in positions_dict.keys():
        positions_dict[pos] = [row["idx"]]
    else:
        positions_dict[pos].append(row["idx"])

Model¶

Decision variables¶

$$x_{i}$$

A binary variable that is equal to 1 if player i is selected into our team

#  Create variables and add it to the dictionary
def player_assignment_variable(df):
    # With Googles ortools, we create a solver
    solver = pywraplp.Solver('simple_mip_program',
                             pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
    # the variables
    x_var_dict = {}
    for index, row in df.iterrows():
        x_var_dict[row['idx']] = solver.BoolVar(str('x_'+str(row['idx'])))
    return x_var_dict, solver

Objective: Maximize total PER¶

Player Efficiency Rating (PER) - takes into account stats, such as field goals, free throws, 3-pointers, assists, rebounds, blocks, and steals, and subtracts the negative results of missed shots, turnovers, and personal fouls.

$$ max \sum_{i} PER_{i}*x_{i} $$

def objective_function(solver, x_var_dict, players_dict):
    objective = solver.Objective()
    for x in x_var_dict.keys():
        objective.SetCoefficient(x_var_dict[x], players_dict[x]["PER"])
    objective.SetMaximization()
    solver.Solve()
    return solver, x_var_dict, objective

Constraints¶

# of Players¶

An NBA team has an average of 15 players.

$$ \sum_{i} x_{i} = 15$$

def total_players(solver, x_var_dict):
    ct = solver.Constraint(12, 15, 'TotalPlayers')
    for x in x_var_dict.keys():
        ct.SetCoefficient(x_var_dict[x], 1)
    return solver

Player positions¶

NBA teams play 82 games over 8 months, which can lead many NBA players to be prone to injuries. In order to make up for this we want to have at least 3 players per position(PG, SG, SF, PF, C).

$$ \sum_{i \in PG} x_{i} \geq 3$$

def player_position(solver, x_var_dict, positions_dict):
    for position in positions_dict.keys():
        ct = solver.Constraint(3, 15, f'Players_Pos_{position}')
        for x in positions_dict[position]:
            ct.SetCoefficient(x_var_dict[x], 1)
    return solver

Team's salary cap¶

The team’s salary cap must be less than the 2022-23 NBA Salary Cap of $123.655 million

$$ \sum_{i} x_{i}*Salary_{i} \leq 123655000$$

def total_salary(solver, x_var_dict, players_dict):
    ct = solver.Constraint(0, 123655000, 'TotalSalary')
    for x in x_var_dict.keys():
        ct.SetCoefficient(x_var_dict[x], players_dict[x]["Salary"])
    return solver

Players Age¶

The players must be under 30 years old

$$ \sum_{i} x_{i}*Age_{i} < 30$$

def player_age(solver, x_var_dict, players_dict):
    ct = solver.Constraint(15,15, 'AGE')
    for x in x_var_dict.keys():
        if players_dict[x]["Age"] < 30:
            ct.SetCoefficient(x_var_dict[x], 1)
    return solver

Highest Points¶

The NBA Team must have three players that generates over 2000 points.

Points Generated = points made + assists

$$ \sum_{i \in T} x_{i} \geq 3$$

def pts_gen(solver, x_var_dict, players_dict):
    ct = solver.Constraint(3, 15, 'PTS GEN')
    for x in x_var_dict.keys():
        if players_dict[x]["PTS Gen"] > 2000:
            ct.SetCoefficient(x_var_dict[x], 1)
    return solver

Highest Win Shares¶

The NBA Team must have four players that have a WS/48 over 0.22.

WS/48 - a measure that is assigned to players based on their offense, defense, and playing time.

$$ \sum_{i \in T} x_{i} \geq 4$$

def win_share(solver, x_var_dict, players_dict):
    ct = solver.Constraint(4, 15, 'Win_Share')
    for x in x_var_dict.keys():
        if players_dict[x]["WS"] > 0.22:
            ct.SetCoefficient(x_var_dict[x], 1)
    return solver

Highest VORP¶

The NBA Team must have three players that have a VORP over 3.

VORP - a box score estimate of the points per 100 TEAM possessions that a player contributed above a replacement-level (-2.0) player, translated to an average team and prorated to an 82-game season.

$$ \sum_{i \in T} x_{i} \geq 3$$

def vorp(solver, x_var_dict, players_dict):
    ct = solver.Constraint(3, 15, 'VORP')
    for x in x_var_dict.keys():
        if players_dict[x]["VORP"] > 3:
            ct.SetCoefficient(x_var_dict[x], 1)
    return solver

Compiles all constraints¶

def constraints(df, players_dict, positions_dict, solver):
    solver = total_players(solver, x_var_dict)
    solver = player_position(solver, x_var_dict, positions_dict)
    solver = total_salary(solver, x_var_dict, players_dict)
    solver = win_share(solver, x_var_dict, players_dict)
    solver = vorp(solver, x_var_dict, players_dict)
    solver = pts_gen(solver, x_var_dict, players_dict)
    solver = player_age(solver, x_var_dict, players_dict)
    return solver

Creates the Roster¶

def get_team(x_var_dict, df):
    df_team = pd.DataFrame()
    for idx in x_var_dict:
        if round(x_var_dict[idx].solution_value()) == 1:
            df_player = df[df['idx'] == idx]
            df_team = pd.concat([df_team, df_player], ignore_index=True)
    return df_team

Optimized NBA Team Roster¶

x_var_dict, solver = player_assignment_variable(df)
solver = constraints(df, players_dict, positions_dict, solver)
solver, x_var_dict, objective = objective_function(solver, x_var_dict, players_dict)
df_team = get_team(x_var_dict, df)

Pos_cat = ['PG','SG','SF','PF','C']
df_team["Pos"] = pd.Categorical(df_team["Pos"], categories = Pos_cat)
df_team.sort_values(by = "Pos",inplace=True)
df_team.reset_index(drop=True, inplace=True)

print(f"""
    Total team  PER: {round(df_team.PER.sum(), 2)}. \n 
    Team's salary:  ${round(df_team.Salary.sum()/1000000, 2)} million""")
df_team=df_team[["Player","Tm","Pos","Age","PER","MP","WS/48","VORP","PTS Gen","Salary"]]

df_team

    Total team  PER: 312.4. 
 
    Team's salary:  $123.51 million

	Player	Tm	Pos	Age	PER	MP	WS/48	VORP	PTS Gen	Salary
0	Jose Alvarado	NOP	PG	23	16.4	834	0.121	0.8	482.0	1563518
1	LaMelo Ball	CHO	PG	20	19.7	2422	0.116	3.3	2079.0	8623920
2	Ja Morant	MEM	PG	22	24.4	1889	0.171	3.9	1948.0	12119440
3	Desmond Bane	MEM	SG	23	17.6	2266	0.153	2.7	1592.0	2130240
4	Tyrese Haliburton	IND	SG	21	18.2	2695	0.125	3.1	1809.0	4215120
5	Terry Taylor	IND	SG	22	19.0	714	0.160	0.3	359.0	1563518
6	Keldon Johnson	SAS	SF	22	15.2	2392	0.098	0.9	1436.0	3873025
7	Kenyon Martin Jr.	HOU	SF	21	14.3	1656	0.084	0.4	797.0	1782621
8	Trendon Watford	POR	SF	21	15.8	869	0.104	0.1	445.0	1563518
9	Giannis Antetokounmpo	MIL	PF	27	32.1	2204	0.281	7.4	2390.0	42492492
10	Brandon Clarke	MEM	PF	25	23.7	1246	0.241	2.0	752.0	4343920
11	Isaiah Jackson	IND	PF	20	20.5	541	0.126	0.1	307.0	2573760
12	Nikola Jokić	DEN	C	26	32.8	2476	0.296	9.8	2588.0	33047803
13	Frank Kaminsky	ATL	C	28	21.3	181	0.228	0.3	108.0	1836090
14	Paul Reed	PHI	C	22	21.4	302	0.212	0.3	132.0	1782621

Conclusion¶

As you can see the model was able to satisfy all the constraints and produces an well rounded NBA team with a salary of $123.51 million and a PER of 312.4. Our calculated NBA team has a PER higher than the 2022 Championship Team, Golden State Warriors, PER of 233.7.

Here is a link to the complete Github respository github.com/EthanFalcao/NBA-Optimized-Team

Finding the Most Optimal NBA Team for the 2022-23 Season