Image to Paragraph Generation
We take a image as input, we analyse what the image is about and then we try to generate a suitable caption for the image and using a language model, we extrapolate it to make a 100 words long paragraph.
muzzled greyhound dog running around track
muzzled greyhound dog running around track the the old bailey upon the the hotel lying at this who encountered him from in the morning have little man and the riders had not take notice to describe itself said in the moment slowly in the old and have an time to be at my present friend who were left and avoid bewildered to my new father in the night of my friend who had loitered in and words shall be but it was given by many and one of my time was not that the hope of the country how the friends will be one of people
boy in blue stands the beach
boy in blue stands the beach of the other other the other the house of not the real young voice of the republic which the jury the latter and had been your hands you may jekyll taught to have been out and in exactly at last earnest she measured of me as he appeared to be the return of the man of the mind during the little nature you lorry not that he will yet reason nor the hour made the door and most of the most amiable crime with his elbows and son were an dear and walked but in him to him who was
three men are standing on sidewalk sidewalk outside the walks
three men are standing on sidewalk sidewalk outside the walks in the wood and was the course of whole struck no rose in the light of man tomorrow the night wore as in the relief of hope in the two of the water of those brothers now for the old woman the thing of fine and then was driven found in the laboratory of my father but when an old sickness never was and in the hall took and not quite some degree of continued to the hands of my being the remembrance of the the president had taken the night and the continual of this he was often in
person rides bike bike
person rides bike bike he went on his seat on his seat on the village as the bank of the carriages of mr attorneygeneral had looked on the world and it was two where could but had been the first part of his the comedy of its work had his departure is the same time the poole was it with his way was made it yes and again he was it in the two goes that some are in the morning now we go over the appearance at last of these way the pretty is the door for the moment of the great faces
girl in pink dress flowers dressed
girl in pink dress flowers dressed out of the cottagers rose the little which would not all speech again to rage which knew that it would be able to that he was another in mind that his cheek he carried his spirit he was his hands at his touch were not entirely to be the matter two hands is so dear mean to say you can see can recall you that which can say what you have no such kindness dont say the case is not that which he looked in the back for him by the morning he entered and the other path and showed
dog and jumps over the air to catch ball and
dog and jumps over the air to catch ball and for one of these as two minutes as he dwelt at first saw him wrapped up along the door to be on the road beyond the moon gazed and the second it would never be first with at pursuit once more was the first of the earth mr lorry had gathered out in his voice but it was there in to the old man than to see the hope of the men men you are so strong but all you informed it every evening with the hands reason of him and he had made him to look on with the
man player attempts to get for goal ball orange player in the opposing the the
man player attempts to get for goal ball orange player in the opposing the the as it gabelle had with my last the whole of the bastille the other was to him to that mr stryver before to be mr every wind and the the fire and the doctor to that time that night may hardly feel the ever to have the aid of the wineshop had been in tale of them for them daily who had in the same fair that he had been there was the ordinance of her own own kind but the day was not in the ringing of the window of this state of it was not unwilling to the
white furry dog running running through the grass
shirtless in blue and climbing on the top overlooking at of the camera of behind river
lioness lioness field is chasing dog the
The final model is basically a combination of two models -
- The image captioning model
- A language model
The image captioning model generates captions for the images and the language model takes the captions as seeds to generate furthur words. The models are trained individually and then combined.
An obvious way to improve the performance is to train on more samples and for longer time. I have trainined using Google Collab and Kaggle (Yeah I am too poor to afford a proper deep learning setup :( . The caption model does pretty good job and captions generated are good. But the language model is not quiet good. This is due to training limitations faced when using Cloud platforms (like session time, GPU time etc ). Well, one could use the pretrained BERT model or other pretrained models but since my objective of making this project was to get a grasp of how things work, I chose to create my own model.









