LOOSELY JOINT NON-NEGATIVE MATRIX FACTORIZATION FOR IMAGE ANNOTATION APPLICATION

1698 Words4 Pages

3. SHARED SPACE LEARNING VIA LOOSELY JOINT NMF (LJNMF)
Our joint space learning method is formulated under the NMF framework. This section will introduce our adaptation of NMF for extraction of shared latent spaces. We name this approach as the loosely joint non-negative matrix factorization or LJNMF.
The purpose of using NMF framework for image annotation is to explain the underlying latent factors exist in an image collection which made different objects in images by representing the occurrence of these factors for each image. In multimodal problems, different modalities come from the same collection and so we expect the truths which make latent factors in those modalities, were nearly similar.
But each modality has different features so these factors are made differently. Similarity in representation of factors interprets as similarity in coefficient matrix and difference of the way factors are made implies different basic matrices. So forcing cost function to find exactly the same coefficient matrix for both modalities isn’t reasonable and also has restrictive binds which doesn’t let modalities to find best factors and so increases error of approximation.
So we will factorize both modalities, such that the factor matrices were different for them. But in fact they are loosely jointed and the similarity between coefficient matrices is encouraged by reducing the distance between their factors. The objective function for loosely joint non-negative matrix factorization or LJNMF in general mode can be written as below.
(9)
where dist(H1, H2) is a metric for measuring distance between two coefficient.
3.1. Notation
In our problem there are two resources for data as two modalities. One is visual information embedded in images and...

... middle of paper ...

...ases on Corel 5K dataset.
6. CONCLUSION
The problem addressed in this paper is to build a multimodal automatic image annotation system that combines two data modalities: visual features extracted from images and text terms collected from attached tags.
It is done by extracting latent factors which explain patterns make content of images, in a unified loosely joint space. Both modalities are factorized simultaneously while consider to relation exists between them. We relaxed the constraint that make coefficients matrices of both modalities be exactly the same and allowed the representation of latent factors having some differences. This was implemented by minimizing the non-linear distance between two coefficients matrices. The proposed LJNMF algorithm could achieve comparable performance to the state-of-the-art works while having lower dimension of feature vector.

Open Document