Image Captioning with CNN-RNN and LSTM

  • Tech Stack: Tensorflow, Keras, Python, CNN-RNN and LSTM, Image processing and NLP
  • Github URL: Project Link

• In this project, I have created a neural network architecture to automatically generate captions from images.

• After using the Microsoft Common Objects in COntext (MS COCO) dataset to train your network, this project' network can be used on novel images!

• I passed all the inputs as a sequence to an LSTM. A sequence looks like this: first a feature vector that is extracted from an input image, then a start word, then the next word, the next word, and so on.

• Embedding Dimension: The LSTM is defined such that, as it sequentially looks at inputs, it expects that each individual input in a sequence is of a consistent size and so we embed the feature vector and each word so that they are embed_size.