3. SHARED SPACE LEARNING VIA LOOSELY JOINT NMF (LJNMF)
Our joint space learning method is formulated under the NMF framework. This section will introduce our adaptation of NMF for extraction of shared latent spaces. We name this approach as the loosely joint non-negative matrix factorization or LJNMF.
The purpose of using NMF framework for image annotation is to explain the underlying latent factors exist in an image collection which made different objects in images by representing the occurrence of these factors for each image. In multimodal problems, different modalities come from the same collection and so we expect the truths which make latent factors in those modalities, were nearly similar.
But each modality has different features so these factors are made differently. Similarity in representation of factors interprets as similarity in coefficient matrix and difference of the way factors are made implies different basic matrices. So forcing cost function to find exactly the same coefficient matrix for both modalities isn’t reasonable and also has restrictive binds which doesn’t let modalities to find best factors and so increases error of approximation.
So we will factorize both modalities, such that the factor matrices were different for them. But in fact they are loosely jointed and the similarity between coefficient matrices is encouraged by reducing the distance between their factors. The objective function for loosely joint non-negative matrix factorization or LJNMF in general mode can be written as below.
(9)
where dist(H1, H2) is a metric for measuring distance between two coefficient.
3.1. Notation
In our problem there are two resources for data as two modalities. One is visual information embedded in images and...
... middle of paper ...
...ases on Corel 5K dataset.
6. CONCLUSION
The problem addressed in this paper is to build a multimodal automatic image annotation system that combines two data modalities: visual features extracted from images and text terms collected from attached tags.
It is done by extracting latent factors which explain patterns make content of images, in a unified loosely joint space. Both modalities are factorized simultaneously while consider to relation exists between them. We relaxed the constraint that make coefficients matrices of both modalities be exactly the same and allowed the representation of latent factors having some differences. This was implemented by minimizing the non-linear distance between two coefficients matrices. The proposed LJNMF algorithm could achieve comparable performance to the state-of-the-art works while having lower dimension of feature vector.
where, G_m is the 〖2 〗^(log_2^(n-m) )* n matrix containing low pass wavelet filter coefficients consequent to scale m = 1, 2, ..., L, and H_L is the matrix of scaling function high pass filter coefficients at the coarsest scale. The matrix, WX has the same size of the input data matrix, X, but after wavelet decomposition, the deterministic component in each variable in X is concentrated in a relatively small number of coefficients in WX, while the stochastic component in each variable is approximately decorrelated in WX, and is increased over all components according to its power spectrum. Theorem 1 shows the relation between the PCA and X and WX.
My chosen methodology for analysis is semiology, Rose (2001) argues semiology confronts the problem of how images make meanings directly. It is not simply descriptive, as compositional interpretation does not appear to be, nor does it rely on quantitative estimations of significance, as content analysis at some level has to. Instead, semiology offers a wide range of analytical tools for depicting an image apart and tracing how it works in relation to broader systems of meaning. A semiological analysis entails the implementation of highly refined set of concepts, which construct detailed accounts of the particular ways the meanings of an image are produced through that image.
...echniques are introduced. Hiremath et al. [22,23] presented novel retrieval frameworks for combining multiple image information, in which the local color and texture descriptors are captured in a coarse segmentation framework of grids.
Clustering This is un-supervised learning method. Text documents here are unlabelled and inherent patterns in text are revealed through cluster formation. This can also be used as prior step for other text mining methods.
Previous Work: As stated above there has been lot of work done reporting promising performance of topic models like results on text categorization in the original LDA paper(Blei et al.2003). Work done by Wei and Bruce Croft(2006) shows that LDA could improve the state of art information retreival in the language modeling framework. Etc.
From many points of views, it can be considered as the starting point. The team working with it has a dream to make more objects recognition which is context base. They also have a desire to make the recognition more interactive. A new and exceptional feature has been suggested where a particular part of an image can be tapped and the information can be heard.
Visual perceptions are supposed to have two sorts of content. First, they have intentional content which relates them as representations to the external world. The properties that constitute the intentional content are called representational or intentional qualities. Second, visual perce...
For visual learners, it is often easy to work with images than working with words. Visual learners recall the image in their head. This is far easier than recalling words.
The mean values compare the averages of the conditions and the contexts in which the image was presented before, after or never.
Normalization is the process of identifying the one best place where each fact belongs, it is being used to minimizinge data redundancy and optimizinge data structure by systematically and properly placing data elements in appropriate g...
9 Fayyad U., Piatetsky-Shapiro G., Smyth, Padhraic - "The KDD Process for Extracting Useful Knowledge from volumes of Data" - Communications of the ACM vol. 39, no. 11 (Nov. 1996).
In developing this theory, Paivio used the idea that the formation of mental images aids in learning. According to Paivio, there are two ways a person could expand on learned material, Verbal associations and visual imagery. Dual-coding theory postulates that both visual and verbal information is used to represent information. Visual and verbal information are processed differently and along distinct channels in the human mind, creating separate representations for information processed in each channel.
Supervised Learning - In this system is presented with different example of input and desired output and the goal is to learn from that. So if more examples are given the t will learn more from the data.
T. Mitchell, Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. Draft Version, 2005 download
In addition to the aforementioned information Neufeld (2009:82) states that we construct knowledge on top of what we already know. As new information come to us from the environment we perceive it as arrangement of figures that can be incorporated into our picture-frame that references the world to us. If that new information cannot be incorporated into our existi...