-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
questionFurther information is requestedFurther information is requested
Description
A 2048-bit RDKit fingerprint is performed on X_train.
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 1],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], shape=(5807, 2048))
Fitting PCovC to this feature array takes around 5-8 minutes to run.
lr = LogisticRegression()
lr.fit(X_train,y_train)
pcovc = PCovC(mixing=0.05,classifier=lr,n_components=2)
pcovc.fit(X_train,y_train)
Now, X_train is fit transformed with a StandardScaler.
array([[-0.59651099, -0.46626218, -0.49935414, ..., -0.34791159,
-0.4461504 , -0.92512945],
[-0.59651099, -0.46626218, -0.49935414, ..., -0.34791159,
-0.4461504 , -0.92512945],
[-0.59651099, -0.46626218, -0.49935414, ..., -0.34791159,
-0.4461504 , -0.92512945],
...,
[-0.59651099, -0.46626218, -0.49935414, ..., -0.34791159,
-0.4461504 , 1.08092981],
[-0.59651099, -0.46626218, -0.49935414, ..., -0.34791159,
-0.4461504 , -0.92512945],
[-0.59651099, -0.46626218, -0.49935414, ..., -0.34791159,
-0.4461504 , -0.92512945]], shape=(5807, 2048))
Fitting PCovC to this scaled feature array takes around 15 seconds to run.
lr = LogisticRegression()
lr.fit(X_train_scaled,y_train)
pcovc = PCovC(mixing=0.05,classifier=lr,n_components=2)
pcovc.fit(X_train_scaled,y_train)
It seems strange that the unscaled fingerprint would take so much longer than the scaled fingerprint to fit. What could be causing this?
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested