-
Notifications
You must be signed in to change notification settings - Fork 2
Regression model #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
notebooks/regression.ipynb
Outdated
| "train_bools = []\n", | ||
| "epitope_bools = []\n", | ||
| "rsa_vals = []\n", | ||
| "for (esm_emb, seq, train_boolmask, epitope_boolmask, rsa) in bp3.iter_rows():\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quality of life thing, you can always do this if you have a large number of rows and don't want to unpack the tuple and name everything
for row in bp3.iter_rows(named=True):
esm_emb = row["esm_emb"]
notebooks/regression.ipynb
Outdated
| } | ||
| ], | ||
| "source": [ | ||
| "# --- Transform to Per-Residue Basis ---\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cell could also be replaced with something like:
bp3.explode("esm_emb", pl.col("seq").str.split(""), "train_boolmask", "epitope_boolmask", "RSA")although this would require you to load the esm_emb object into the dataframe as a list rather than a tensor, which polars doesn't know the length of
notebooks/regression.ipynb
Outdated
| " esm_embeddings = []\n", | ||
| " for job_num in range(bp3.shape[0]):\n", | ||
| " job_name = bp3.select(\"job_name\")[job_num].item()\n", | ||
| " esm_embeddings.append(torch.load(ESM_ENCODING_DIR / (job_name + \".pt\")))\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this to be a list since they can be converted into polars lists when inserted as a column and play nice with polars- notice how in the printed dataframe polars thinks it is an opaque "object"/binary blob and polars operations won't work on it
torch.load(ESM_ENCODING_DIR / (job_name + ".pt")).tolist()
| "X_df = train_df[agg_features]\n", | ||
| "y_df = train_df[\"epitope_bools\"]\n", | ||
| "\n", | ||
| "X = X_df.values\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polars version of this is
bp3_res.select(agg_features).to_numpy()| " y_train, y_test = y[train_index], y[test_index]\n", | ||
| "\n", | ||
| " # --- Scale Features ---\n", | ||
| " scaler = StandardScaler() \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does bepipred scale embedding features? Just curious if this is necessary
No description provided.