Linear Optimized NBA Team
Our goal is to use Linear Programming to determine the most optimal NBA Team for the upcoming ‘22-’23 NBA Season.
We have gathered our data from www.basketball-reference.com and are trying to maximize the Player
Efficiency Rating (PER) of the whole team. The most optimal team will need to stay under the ‘22-’23 season
salary cap of $123,655,000, and have 15 players, with 3 players per position (PG, SG, SF, PF, C).
Data Sources:
Contracts: www.basketball-reference.com/contracts/players.html
Advanced Stats: www.basketball-reference.com/leagues/NBA_2022_advanced.html
Normal Stats: www.basketball-reference.com/leagues/NBA_2022_totals.html
Data¶
import os
from ortools.linear_solver import pywraplp
import pandas as pd
import numpy as np
df_PER = pd.read_csv("/Users/ethan/Downloads/data_sets/NBA_LP_PER.csv", sep=";")
df_PTS = pd.read_csv("/Users/ethan/Downloads/data_sets/NBA_LP_PTS.csv", sep=";")
df_salary = pd.read_csv("/Users/ethan/Downloads/data_sets/NBA_LP_Salary.csv", sep=";")
#Filtering the relevant columns
df_PER = df_PER[["Player","Pos","Tm","PER","WS/48","VORP"]]
df_PTS = df_PTS[["Player","Age","Tm","MP","PTS Gen"]]
df_salary = df_salary[["Player","Tm","Salary"]]
# Merging the dataframes
df=pd.merge(pd.merge(df_PTS,df_PER,on='Player'),df_salary,on='Player')
df = df.drop_duplicates("Player")
df = df[(df["VORP"] >= 0) & (df["MP"] >= 10)& (df["PTS Gen"] >= 100)]
df = df[["Player","Tm","Pos","Age","PER","MP","WS/48","VORP","PTS Gen","Salary"]]
df
Player | Tm | Pos | Age | PER | MP | WS/48 | VORP | PTS Gen | Salary | |
---|---|---|---|---|---|---|---|---|---|---|
1 | Steven Adams | MEM | C | 28 | 17.6 | 1999 | 0.163 | 2.0 | 784.0 | 17926829 |
2 | Bam Adebayo | MIA | C | 24 | 21.8 | 1825 | 0.188 | 2.7 | 1258.0 | 30351780 |
7 | Grayson Allen | MIL | SG | 26 | 12.7 | 1805 | 0.110 | 1.1 | 833.0 | 8500000 |
8 | Jarrett Allen | CLE | C | 23 | 23.0 | 1809 | 0.225 | 2.7 | 996.0 | 20000000 |
9 | Jose Alvarado | NOP | PG | 23 | 16.4 | 834 | 0.121 | 0.8 | 482.0 | 1563518 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
509 | Christian Wood | DAL | C | 26 | 19.1 | 2094 | 0.112 | 1.9 | 1373.0 | 14317459 |
510 | Delon Wright | WAS | SG | 29 | 13.8 | 1452 | 0.121 | 1.6 | 530.0 | 7804878 |
512 | Thaddeus Young | TOR | PF | 33 | 17.0 | 845 | 0.126 | 0.9 | 426.0 | 8000000 |
515 | Trae Young | ATL | PG | 23 | 25.4 | 2652 | 0.181 | 4.8 | 2892.0 | 37096500 |
516 | Omer Yurtseven | MIA | C | 23 | 17.4 | 706 | 0.145 | 0.2 | 348.0 | 1752638 |
274 rows × 10 columns
df["idx"] = df.index
df["Pos"] = df["Pos"].str.slice(stop=2)
df["Pos"] = df["Pos"].str.replace('-','')
# players_dict has information about the players, with a key indicating the index in df
# positions_dict has information on each player's position
players_dict = {}
positions_dict = {}
for index, row in df.iterrows():
players_dict[row["idx"]] = {
"name": row["Player"],
"Pos": row["Pos"],
"MP": row["MP"],
"PER": row["PER"],
"Age": row["Age"],
"WS": row["WS/48"],
"VORP": row["VORP"],
"PTS Gen": row["PTS Gen"],
"Salary": row["Salary"]
}
pos = row["Pos"]
if pos not in positions_dict.keys():
positions_dict[pos] = [row["idx"]]
else:
positions_dict[pos].append(row["idx"])
# Create variables and add it to the dictionary
def player_assignment_variable(df):
# With Googles ortools, we create a solver
solver = pywraplp.Solver('simple_mip_program',
pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
# the variables
x_var_dict = {}
for index, row in df.iterrows():
x_var_dict[row['idx']] = solver.BoolVar(str('x_'+str(row['idx'])))
return x_var_dict, solver
Objective: Maximize total PER¶
Player Efficiency Rating (PER) - takes into account stats, such as field goals, free throws, 3-pointers, assists, rebounds, blocks, and steals, and subtracts the negative results of missed shots, turnovers, and personal fouls.
$$ max \sum_{i} PER_{i}*x_{i} $$def objective_function(solver, x_var_dict, players_dict):
objective = solver.Objective()
for x in x_var_dict.keys():
objective.SetCoefficient(x_var_dict[x], players_dict[x]["PER"])
objective.SetMaximization()
solver.Solve()
return solver, x_var_dict, objective
def total_players(solver, x_var_dict):
ct = solver.Constraint(12, 15, 'TotalPlayers')
for x in x_var_dict.keys():
ct.SetCoefficient(x_var_dict[x], 1)
return solver
Player positions¶
NBA teams play 82 games over 8 months, which can lead many NBA players to be prone to injuries. In order to make up for this we want to have at least 3 players per position(PG, SG, SF, PF, C).
$$ \sum_{i \in PG} x_{i} \geq 3$$def player_position(solver, x_var_dict, positions_dict):
for position in positions_dict.keys():
ct = solver.Constraint(3, 15, f'Players_Pos_{position}')
for x in positions_dict[position]:
ct.SetCoefficient(x_var_dict[x], 1)
return solver
Team's salary cap¶
The team’s salary cap must be less than the 2022-23 NBA Salary Cap of $123.655 million
$$ \sum_{i} x_{i}*Salary_{i} \leq 123655000$$def total_salary(solver, x_var_dict, players_dict):
ct = solver.Constraint(0, 123655000, 'TotalSalary')
for x in x_var_dict.keys():
ct.SetCoefficient(x_var_dict[x], players_dict[x]["Salary"])
return solver
Players Age¶
The players must be under 30 years old
$$ \sum_{i} x_{i}*Age_{i} < 30$$def player_age(solver, x_var_dict, players_dict):
ct = solver.Constraint(15,15, 'AGE')
for x in x_var_dict.keys():
if players_dict[x]["Age"] < 30:
ct.SetCoefficient(x_var_dict[x], 1)
return solver
Highest Points¶
The NBA Team must have three players that generates over 2000 points.
Points Generated = points made + assists
$$ \sum_{i \in T} x_{i} \geq 3$$def pts_gen(solver, x_var_dict, players_dict):
ct = solver.Constraint(3, 15, 'PTS GEN')
for x in x_var_dict.keys():
if players_dict[x]["PTS Gen"] > 2000:
ct.SetCoefficient(x_var_dict[x], 1)
return solver
Highest Win Shares¶
The NBA Team must have four players that have a WS/48 over 0.22.
WS/48 - a measure that is assigned to players based on their offense, defense, and playing time.
$$ \sum_{i \in T} x_{i} \geq 4$$def win_share(solver, x_var_dict, players_dict):
ct = solver.Constraint(4, 15, 'Win_Share')
for x in x_var_dict.keys():
if players_dict[x]["WS"] > 0.22:
ct.SetCoefficient(x_var_dict[x], 1)
return solver
Highest VORP¶
The NBA Team must have three players that have a VORP over 3.
VORP - a box score estimate of the points per 100 TEAM possessions that a player contributed above a replacement-level (-2.0) player, translated to an average team and prorated to an 82-game season.
$$ \sum_{i \in T} x_{i} \geq 3$$def vorp(solver, x_var_dict, players_dict):
ct = solver.Constraint(3, 15, 'VORP')
for x in x_var_dict.keys():
if players_dict[x]["VORP"] > 3:
ct.SetCoefficient(x_var_dict[x], 1)
return solver
Compiles all constraints¶
def constraints(df, players_dict, positions_dict, solver):
solver = total_players(solver, x_var_dict)
solver = player_position(solver, x_var_dict, positions_dict)
solver = total_salary(solver, x_var_dict, players_dict)
solver = win_share(solver, x_var_dict, players_dict)
solver = vorp(solver, x_var_dict, players_dict)
solver = pts_gen(solver, x_var_dict, players_dict)
solver = player_age(solver, x_var_dict, players_dict)
return solver
Creates the Roster¶
def get_team(x_var_dict, df):
df_team = pd.DataFrame()
for idx in x_var_dict:
if round(x_var_dict[idx].solution_value()) == 1:
df_player = df[df['idx'] == idx]
df_team = pd.concat([df_team, df_player], ignore_index=True)
return df_team
Optimized NBA Team Roster¶
x_var_dict, solver = player_assignment_variable(df)
solver = constraints(df, players_dict, positions_dict, solver)
solver, x_var_dict, objective = objective_function(solver, x_var_dict, players_dict)
df_team = get_team(x_var_dict, df)
Pos_cat = ['PG','SG','SF','PF','C']
df_team["Pos"] = pd.Categorical(df_team["Pos"], categories = Pos_cat)
df_team.sort_values(by = "Pos",inplace=True)
df_team.reset_index(drop=True, inplace=True)
print(f"""
Total team PER: {round(df_team.PER.sum(), 2)}. \n
Team's salary: ${round(df_team.Salary.sum()/1000000, 2)} million""")
df_team=df_team[["Player","Tm","Pos","Age","PER","MP","WS/48","VORP","PTS Gen","Salary"]]
df_team
Total team PER: 312.4. Team's salary: $123.51 million
Player | Tm | Pos | Age | PER | MP | WS/48 | VORP | PTS Gen | Salary | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Jose Alvarado | NOP | PG | 23 | 16.4 | 834 | 0.121 | 0.8 | 482.0 | 1563518 |
1 | LaMelo Ball | CHO | PG | 20 | 19.7 | 2422 | 0.116 | 3.3 | 2079.0 | 8623920 |
2 | Ja Morant | MEM | PG | 22 | 24.4 | 1889 | 0.171 | 3.9 | 1948.0 | 12119440 |
3 | Desmond Bane | MEM | SG | 23 | 17.6 | 2266 | 0.153 | 2.7 | 1592.0 | 2130240 |
4 | Tyrese Haliburton | IND | SG | 21 | 18.2 | 2695 | 0.125 | 3.1 | 1809.0 | 4215120 |
5 | Terry Taylor | IND | SG | 22 | 19.0 | 714 | 0.160 | 0.3 | 359.0 | 1563518 |
6 | Keldon Johnson | SAS | SF | 22 | 15.2 | 2392 | 0.098 | 0.9 | 1436.0 | 3873025 |
7 | Kenyon Martin Jr. | HOU | SF | 21 | 14.3 | 1656 | 0.084 | 0.4 | 797.0 | 1782621 |
8 | Trendon Watford | POR | SF | 21 | 15.8 | 869 | 0.104 | 0.1 | 445.0 | 1563518 |
9 | Giannis Antetokounmpo | MIL | PF | 27 | 32.1 | 2204 | 0.281 | 7.4 | 2390.0 | 42492492 |
10 | Brandon Clarke | MEM | PF | 25 | 23.7 | 1246 | 0.241 | 2.0 | 752.0 | 4343920 |
11 | Isaiah Jackson | IND | PF | 20 | 20.5 | 541 | 0.126 | 0.1 | 307.0 | 2573760 |
12 | Nikola Jokić | DEN | C | 26 | 32.8 | 2476 | 0.296 | 9.8 | 2588.0 | 33047803 |
13 | Frank Kaminsky | ATL | C | 28 | 21.3 | 181 | 0.228 | 0.3 | 108.0 | 1836090 |
14 | Paul Reed | PHI | C | 22 | 21.4 | 302 | 0.212 | 0.3 | 132.0 | 1782621 |
Conclusion¶
As you can see the model was able to satisfy all the constraints and produces an well rounded NBA team with a salary of $123.51 million and a PER of 312.4. Our calculated NBA team has a PER higher than the 2022 Championship Team, Golden State Warriors, PER of 233.7.
Here is a link to the complete Github respository github.com/EthanFalcao/NBA-Optimized-Team
Predicting Business Attributes
Utilized Machine learning to predict the business attributes using review and tip textual information.
Predicting the World Cup
Calculated optimal Machine Learning algorithm for 2022 FIFA World Cup winner prediction.
Detecting Fake Reviews
Utilized Machine learning to help accurately detect fake reviews in the Yelp dataset, to improve the reliability of online reviews.