Skip to content

Conversation

@ljwoods2
Copy link
Collaborator

@ljwoods2 ljwoods2 commented Oct 2, 2025

No description provided.

"train_bools = []\n",
"epitope_bools = []\n",
"rsa_vals = []\n",
"for (esm_emb, seq, train_boolmask, epitope_boolmask, rsa) in bp3.iter_rows():\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quality of life thing, you can always do this if you have a large number of rows and don't want to unpack the tuple and name everything

for row in bp3.iter_rows(named=True):
    esm_emb = row["esm_emb"]

}
],
"source": [
"# --- Transform to Per-Residue Basis ---\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cell could also be replaced with something like:

bp3.explode("esm_emb", pl.col("seq").str.split(""), "train_boolmask", "epitope_boolmask", "RSA")

although this would require you to load the esm_emb object into the dataframe as a list rather than a tensor, which polars doesn't know the length of

" esm_embeddings = []\n",
" for job_num in range(bp3.shape[0]):\n",
" job_name = bp3.select(\"job_name\")[job_num].item()\n",
" esm_embeddings.append(torch.load(ESM_ENCODING_DIR / (job_name + \".pt\")))\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to be a list since they can be converted into polars lists when inserted as a column and play nice with polars- notice how in the printed dataframe polars thinks it is an opaque "object"/binary blob and polars operations won't work on it

torch.load(ESM_ENCODING_DIR / (job_name + ".pt")).tolist()

"X_df = train_df[agg_features]\n",
"y_df = train_df[\"epitope_bools\"]\n",
"\n",
"X = X_df.values\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polars version of this is

bp3_res.select(agg_features).to_numpy()

" y_train, y_test = y[train_index], y[test_index]\n",
"\n",
" # --- Scale Features ---\n",
" scaler = StandardScaler() \n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does bepipred scale embedding features? Just curious if this is necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants