Survey on scene classification based on PLSA

609 Words2 Pages

2) Matrix Factorization
Matrix factorization approach is one more representation of pLSA. The word frequency matrix that defines the dataset is a very large and sparse matrix; it has number of rows equal to the documents d, and the number of columns is the number of different words k that appear in our corpus. The reason for sparseness is because only a small percentage of the words are used in each document depending on its particular topic. So, dimensionality reduction is an issue for word frequency matrix as most of its entries are null providing no specific detailing. This can be attained by approximating the co-occurrence matrix (denoted by F) as a product of two low-rank (thinner) matrices P and R. For example:
F≈F ̂=P.R
So, if the size of P is X×Y and the size of R is Y×Z, with Y≪Z,X , this will accomplish the dimensionality reduction, as X ∙"Z"≫ "X∙Y+Y∙Z" . Also matrices P and R indicate some details about the latent structure of the data. pLSA exactly performs a matrix factorization of the conditional distribution P(w|d).
F = "P∙Q∙R" where,
P consists of the document probabilities P(d|z).
Q is a diagonal matrix of the prior probabilities of the topics P(z).
R relate with the word probability P(w|z).
These matrices represent probability distribution and thus are non-negative and normalized.

pLSA: procedural view
The significant and accurate result from pLSA has increased its outstanding convention in regular practices. Topic Detection and Tracking Corpus in Word Usage Analysis, Image Classification Model etc are some instances of pLSA usage. There are primary steps in Scene Classification that are Training and Testing.

Fig 3. The complete pLSA formulation design defining its primary stages: Training on images, BOW f...

... middle of paper ...

...val, CIVR, Dublin, Ireland [2004]
[15] David G. Lowe, University of British Columbia. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, [2004]

[16] Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision (42) 145-175 [2001]

[17] Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis.
Machine Learning 41,177-196, [2001]

[18] M. J. Swain and D. H. Ballard, “Color indexing,” International Journal of Computer
Vision, Vol. 7, No. 1,pp. 11-32.

[19] Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the
22nd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '99, pages 50-57, New York, NY, USA, ACM.[1999]

More about Survey on scene classification based on PLSA

Open Document