Develop by Biomerene · Pull Request #26 · MGXlab/DNNGIOR

Biomerene · 2024-10-04T13:09:50Z

I finally got around to fixing some of the last items of the features I added. As far as I can tell this should not break anything.

New Features:

A command line interface which allows you to automatically convert a folder of genomes to models (assuming modelseedpy does not break).
A black and grey list which allows you to limit or ban reactions from the database
A function to automatically load a medium file (needed for the CLI)
Partially updated example notebooks
Fixed some things related to BiGG

It misses some of the later commits of the main branch (mainly related to what @hariszaf did last week I think). Hopefully we can fix this using GitHub. I also have not yet written anything about the CLI in the example notebook or readme as this will take some time. The medium and medium_file arguments might be redundant but I did not want to remove something.

Let me know if I missed something or do something differently

distribute library with pip

spelling error

…into Biomerene-patch-1

Biomerene patch 1

…sistent

… the draft model

…complete as default medium name

…ut a bit

hariszaf

Thanks @Biomerene !
Very useful PR!
I have a couple of minor comments.
Once we resolve those, we could merge the PR to main and then check what if anything from #19 is still needed

hariszaf · 2024-12-11T12:17:46Z

dnngior/NN_Predictor.py

+            elif modeltype:
+                if modeltype == 'ModelSEED':
+                    self.path = TRAINED_NN_MSEED
+                elif modeltype == 'BiGG':


@Biomerene do we know support BiGG all the way through? 😃

Gap-filling BiGG models should work now yes, the models do not get refined based on the aliases (which are part of the ModelSEED database). See also my comment on the refinement in the model build (#26 (comment)).

hariszaf · 2024-12-11T12:23:12Z

dnngior/NN_Trainer.py

    return o

-def generate_training_set(data,nuplo, min_con, max_con, min_for, max_for, del_p, con_p):
+def generate_feature(data,nuplo, min_con, max_con, min_for, max_for, del_p, con_p):


add a space after data,.

I added a space (49beebc)

hariszaf · 2024-12-11T12:26:27Z

dnngior/__init__.py


 import os
-os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
+os.environ['TF_CPP_MIN_LOG_LEVEL'] = '5'


Is tensorflow now a strong dependency ? if it is optional, then i think this may be better to be under an if..else statement because I guess it would lead to an error.

I moved this documentation to be loaded when the tensorflow package is loaded (442d04b)

hariszaf · 2024-12-11T12:30:00Z

dnngior/build_model.py

+                # 1) Add metabolite info
+
+                if metabName in compounds_aliases_dict:
+                    metab.annotation = compounds_aliases_dict[metabName].copy()


I guess this is related to #24 right?
If yes, then it would be a good thing to edit your PR message and mention that the issue is addressed in this PR, to be able to track it! 😉

It is related, I think, but I am not entirely sure if this issue is entirely resolved now, we would need to retest the models with memote but I have not done this

hariszaf · 2024-12-11T12:33:45Z

dnngior/build_model.py

-
+                reac.name = reac.id
+    elif dbType=='BiGG':
+        print("skipping refinement, not currently supported for BiGG models")


Hmm..
I am a bit confused with the BiGG case.
Would you like to make clear what parts of the dnngior workflow are supported for BiGG models and what are not, so we implement those in the fututre?

If I remember correctly, this function makes use of some additional information and functions from ModelSEED biochemistry to improve the annotations, I think @danielriosgarza build this part. This is not there for the BiGG database so right now this step is just skipped. I am not sure if we mention the refinement of the ModelSEED models.

hariszaf · 2024-12-11T12:36:04Z

dnngior/fasta2model_CLI.py

+    else:
+        sys.exit('# ERROR: your output_folder should have parents')
+
+def main():


Since this is your main function for the CLI, I suggest you add some documentation

I added some documentation (with some help from collab) see commit c98534a

hariszaf · 2024-12-11T12:38:49Z

dnngior/fasta2model_CLI.py

+            gapfill_model_wrapper(args)
+        print('# Done')
+    else:
+        sys.exit('I dont think this message can show, if it does, you (or more likely me) did something weird')


Consider editing this message returning something like
"either a fasta_folder or a model_folder needs to be provided"

This should actually be already be caught by the argument parser, but this is more informative in case that did not work correctly, I updated it.

hariszaf · 2024-12-11T12:50:18Z

dnngior/reaction_class.py

-                                    side = -1
-                        else:
-                            mets[i] = stoc
+                reactions[reaction]['metabolites'] = eval(react_d[reaction][0])


why using eval here? if there is a chance for an error, would not be better to have some catches for them ?

This is because the metabolites are a dictionary that are saved as a string in the bigg_reactions.tsv database, arguably it would be better to save it as a dictionary in the file instead. For now I added a comment in the code (d7bf238).

hariszaf · 2024-12-11T12:54:09Z

dnngior/gapfill_class.py

+        Input:
+            list of reactions to remove from all reactions
+        Output:
+            None


That is a bit funny.
Consider having something like "trimmed list of candidate reactions "

I updated the comment to read "list with reactions removed" (d7bf238) though technically the function does not output anything but modifies the list in place

danielriosgarza and others added 30 commits May 22, 2023 22:17

Merge pull request #11 from hariszaf/develop

371e87a

distribute library with pip

Update example.ipynb

fcad7da

spelling error

first attempt

eb9e5c9

there is a problem with _r vs _rv related to the removed bigg fix

b9e807c

Discovered a (unlikely) bug related to predicting non existant reactions

92aa6f1

related to the other _r vs _rv

7567e8c

Merge branch 'Biomerene-patch-1' of https://github.com/MGXlab/DNNGIOR …

e6ecd4c

…into Biomerene-patch-1

Merge pull request #21 from MGXlab/Biomerene-patch-1

ebe0087

Biomerene patch 1

copied from MSEED_compounds

b52b109

made a change to not include exchange reactions in db reactions

2c0ab2b

script to create exchanges model

eb60862

moved to dnngior

1671227

readded bigg database

e14475b

model with exchanges

178dbe1

slight cleanup

69c58c7

renamed to distinguish between modelseed and BiGG

35c41cd

commandline tool to build models from genomes

61b8e08

test scripts

555f0d1

version control is hard

0296a32

Commandline tool to build models from genomes

9e78ab1

added a blacklist and greylist option

6d8c8d1

Changed from direction to reversibility column for the modelseed

297a32d

made some changes to the loading of BiGG

370a916

removed space

8df6406

removed some modelseed default

4d756ae

changed name

b996749

lazy loading to avoid having to load everything if you want help

173951e

changed name

a98b1cb

faa of fasta, idc

c24c922

please ignore, not doing this right

b8bd592

Biomerene added 12 commits June 12, 2024 14:39

some additional testing

bd72b60

updating the tutorial

83bc8c0

trying to merge to avoid future issues

5ddb294

added a suffix parameter and made the strings f and slightly more con…

41a5280

…sistent

updated tutorial to include blacklists

247a78c

new option to load medium file using CLI

8b49573

save media filename to data file

c1dc4b4

broke the testscript while trying to stage blocks

610ae0d

transfer annotation from draftmodel to gapfilled model

1632247

some small additional comments

ac1318f

Caught an exception for when you blacklist reactions that are part of…

66afa3d

… the draft model

updated the example notebook with some of the new features

aeab2bc

Biomerene requested review from danielriosgarza and hariszaf October 4, 2024 13:09

hariszaf and others added 8 commits October 4, 2024 15:55

Merge branch 'main' into develop

9d3fa92

updated help text, added model suffix option and fixed checks, added …

92b3153

…complete as default medium name

added a part about the command line interface and cleaned up the outp…

2412c15

…ut a bit

replaced preprint link with link to publication

e886a8e

merged diversion main into develop

df8b73b

removed old tutorial in the wrong place

d0e680b

added some missing parts, improved some explanations

c1f2d2e

fixed broken link and changed debug message

f44b3d6

hariszaf requested changes Dec 11, 2024

View reviewed changes

hariszaf approved these changes Dec 11, 2024

View reviewed changes

Biomerene added 5 commits February 21, 2025 18:41

added space

49beebc

added comments explaining functions and updated error message

c98534a

moved tensorflow specific log config to

442d04b

updated some comments for clarity

d7bf238

fixed return raised in issue #8

0dd1995

Biomerene merged commit 17e08a8 into main Feb 28, 2025
1 check passed

Comments

Conversation

Biomerene commented Oct 4, 2024

Uh oh!

hariszaf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants