-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi, thanks for providing such a cool framework to the community. However, I have a few questions during the generation of 5'UTRs. For some reason, I would like to avoid generating sequences with the ATG start codon motif, and I used Optimus50 (for 50nt sequences) rather than the evaluation model shown in the tutorial (for 54nt sequences).
But actually in both scenarios, it always generated sequences with ATGs, and in the original BMC Bioinfo paper, the designed 5'UTRs always have ATG motifs.
I also noticed the punish ATG function:
def get_punish_atg(pwm_start, pwm_end) :
def punish(pwm) :
atg_score = K.sum(pwm[..., pwm_start:pwm_end-2, 0, 0] * pwm[..., pwm_start+1:pwm_end-1, 3, 0] * pwm[..., pwm_start+1:pwm_end-1, 2, 0], axis=-1)
return atg_score
return punishbut I didn't quite get the position dimension for the 3rd nt G: pwm[..., pwm_start+1:pwm_end-1, 2, 0], shouldn't it be pwm[..., pwm_start+2:pwm_end, 2, 0]?
In this case, would it penalize the ATG ONLY at the end of the sequence or it would penalize all possible ATG motifs along the sequences? I also try to add high weight scalar for this seq_loss term corresponding to ATG penalty, but it still generates sequences with ATGs.
Thanks in advance for your time.