Skip to content

ATG motif penalty in 5'UTR generation #5

@hongruhu

Description

@hongruhu

Hi, thanks for providing such a cool framework to the community. However, I have a few questions during the generation of 5'UTRs. For some reason, I would like to avoid generating sequences with the ATG start codon motif, and I used Optimus50 (for 50nt sequences) rather than the evaluation model shown in the tutorial (for 54nt sequences).
But actually in both scenarios, it always generated sequences with ATGs, and in the original BMC Bioinfo paper, the designed 5'UTRs always have ATG motifs.

I also noticed the punish ATG function:

def get_punish_atg(pwm_start, pwm_end) :
    def punish(pwm) :
        atg_score = K.sum(pwm[..., pwm_start:pwm_end-2, 0, 0] * pwm[..., pwm_start+1:pwm_end-1, 3, 0] * pwm[..., pwm_start+1:pwm_end-1, 2, 0], axis=-1)
        return atg_score
    return punish

but I didn't quite get the position dimension for the 3rd nt G: pwm[..., pwm_start+1:pwm_end-1, 2, 0], shouldn't it be pwm[..., pwm_start+2:pwm_end, 2, 0]?

In this case, would it penalize the ATG ONLY at the end of the sequence or it would penalize all possible ATG motifs along the sequences? I also try to add high weight scalar for this seq_loss term corresponding to ATG penalty, but it still generates sequences with ATGs.

Thanks in advance for your time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions