Skip to content

kd1510/neural_image_caption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Neural Image Caption Generator

  1. Introduction
  2. Architecture

Introduction

Architecture

image

  • The pytorch implementation can be found in encoder_decoder.py.

Encoder

  • The encoder is an EfficientNet with weights pretrained on ImageNet.
  • The final layer of the EfficientNet is removed all prior layers are frozen for the duration of the training process.
  • The image embedding is passed through a linear layer to reduce the dimensionality of the feature vector to the dimensionality of the joint embedding space.
  • This final layer is jointly trained along with the decoder in order to learn the joint embedding space.

Decoder

  • The decoder is an LSTM which generates a caption for the image.
  • At the start of the decoding process, the feature vector from the encoder is passed through the LSTM to allow the hidden state to view the embedded representation of the image.
  • A linear layer is added in order to map the hidden state outputs to the vocabulary space, in order to generate a probability distribution over the next word in the caption.

About

Implementing a ConvNet+LSTM caption net

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages