The overall objective is to cluster the near-duplicate images. Initially, the user passes the query to the search engine and the search engine results in set of query related images. These images contain duplicate as well as near-duplicate images. The main aim of this paper is to detect near-duplicate images and cluster those images. This is achieved through the following steps – Image Preprocessing, Feature Extraction and Clustering. In image processing, the initial step is preprocessing. Image preprocessing is nothing but noise removal and image enhancement. Then feature extraction includes the extraction of key points and key points matching. These matched key points are allowed for estimation of affine transform based on an affine invariant ratio of normalized lengths. At last, Clustering is performed which includes Supervised and Unsupervised Clustering. This results in cluster of images. Each of these clusters will have one image as a representative of that cluster and other images in the cluster is called its near-duplicates. At last, performance measure is calculated for the evaluation of algorithm accuracy.
Figure 1 shows the block diagram of the proposed system. It is seen that the final output will be many clusters; each consisting of near-duplicates relating to the representative cluster.
Fig. 1. Block Diagram of the Proposed System
3.1 Image Preprocessing
Pre-processing methods use a small neighborhood of a pixel in an input image to get a new brightness value in the output image; also called filtration. Local pre-processing methods can be divided into the two groups according to the goal of the processing: Smoothing suppresses noise or other small fluctuations in the image; equivalent to the suppression of high...
... middle of paper ...
...o cut. The brief idea is clustering is done around half data through Hierarchical clustering and succeed by K-means for the remaining. In order to create super-rules, Hierarchical is terminated when it generates the largest number of clusters.
Algorithm –
1. Finish a complete agglomerative Hierarchical clustering on the data and record number of clusters generated during the process.
2. Run the agglomerative Hierarchical clustering again and stop the process when largest number of clusters is generated.
3. Execute the k-means clustering on the remaining data which are not processed in the step 2 and use the centroids for every cluster in step 2 and are served as initial centroids in the k-means clustering algorithm.
After the clustering process is over, set of clusters will be found. Each cluster represents a set of near-duplicates with one representative image.

Related

- Good Essays
## Biometric Iris Recognition: A Literature Survey

- 1099 Words
- 3 Pages

[106] also proposed an iris recognition algorithm for noisy iris images based on bi-orthogonal wavelet transform. Zafar et. al. [107] proposed a simple method of segmentation using Gaussian filter followed by canny edge detector after the necessary preprocessing. Feature extraction is then applied on the segmented images, to represent the iris in fewer dimensions.

- 1099 Words
- 3 Pages

Good Essays - Satisfactory Essays
In automatic image annotation syst... ... middle of paper ... ...revious groups both of modalities are decomposed to their factors, but sometimes one modality is plays main role and the other modalities help the main modality to be factorized better. This guidance is usually done by defining regularization parameters. There are plenty researches in this group. For example [19] in a unified probabilistic model, use from tagging data to select neighbors of each items, then add unique Gaussian distributions on each item’s latent feature vector in the matrix factorization to ensure similar items will have similar latent features. Li, et al [20] designed a nonlinear NMF method with the priors of inter and intra-correlations among images and tags to predict the tag relevance of images.

- 1631 Words
- 4 Pages

Satisfactory Essays - Satisfactory Essays
## The Process of Quantization

- 680 Words
- 2 Pages

In scalar quantization, each input symbol is treated separately in producing the output, while in vector quantization the input symbols are clubbed together in groups called vectors, and processed to give the output. This clubbing of data and treating them as a single unit increases the optimality of the vector quantizer, but at the cost of increased computational complexity. Coefficients that corresponds to smooth parts of data become small. (Indeed, their difference, and therefore their associated wavelet coefficient, will be zero, or very close to it). So we can throw away these coefficients without significantly distorting the image.

- 680 Words
- 2 Pages

Satisfactory Essays - Satisfactory Essays
## Image Processing

- 1733 Words
- 4 Pages

The algorithm also accounts for discontinuities in the shape contour and can reach nearby pixels. The contour trace starts from the top left point or pixel closest to the shape and proceeds clockwise following the surrounding of the contour of the shape rather than the contour itself. The path around the contour is traced in a look-forward sweep pattern to find the next surrounding point that is closest to the contour. The path is then closed when the start point is found.

- 1733 Words
- 4 Pages

Satisfactory Essays - Good Essays
## Cluster Analysis: Clustering Methods

- 722 Words
- 2 Pages

Centroid Method: Calculates the distance between two clusters as the total of distances between their means for all of the variables. (J. Norušis, 2012) e. Ward´s method: Analyze the variances of data points in order to determine the distance between clusters. For each cluster, the means for all variables are calculate, As well as the squared Euclidean distance to the cluster means. (J. Norušis, 2012) b) Divisive clustering: starts with all data in one cluster and ends up with each in individual clusters. Moreover, this could be also proved with another method, if does not satisfies the model.

- 722 Words
- 2 Pages

Good Essays - Powerful Essays
## Non-negative Patch Alignment Framework

- 2639 Words
- 6 Pages

To solve NPAF, we propose a general multiplicative update rule (MUR). By introducing a new auxiliary function, we theoretically show that MUR for NPAF converges. As an application of NPAF, we develop a new NMF based dimension reduction algorithm, termed non-negative discriminative locality alignment (NDLA), wherein the learned bases exploits the latent local geometry structure in the dataset as well as the discriminative information indicated by sample labels. Experiments on synthetic and face image datasets suggest the effectiveness of NDLA in classification tasks and its robustness to image occlusions, compared with representative NMF based dimension reduction algorithms. Keywords: patch alignment framework, non-negative matrix factorization, image occlusion 1 Introduction Dimension reduction is the process of transforming data from a high-dimensional space to a low-dimensional subspace.

- 2639 Words
- 6 Pages

Powerful Essays - Powerful Essays
## A Comparison of Sortring Algorithms

- 1405 Words
- 3 Pages
- 6 Works Cited

Sedgewick and Wayne 2011 states when implementing the algorithm in a computer, before inserting the current element into the emptied location, space needs to be made by moving the larger elements one position to the right in order to insert the current element in its c... ... middle of paper ... ...clusion, the experiment and report successfully described the performance of the sorting algorithms, how they work and also provoked critical thinking when contradictions were encountered. Works Cited Bruno R. Preiss, P., 1997. Example-Bucket Sort. http://www.brpreiss.com/books/opus4/html/page74.html. Carrano, F. M., 2012, 2007, 2003.

- 1405 Words
- 3 Pages
- 6 Works Cited

Powerful Essays - Powerful Essays
## Parallel Implementation of MPEG4

- 1490 Words
- 3 Pages
- 8 Works Cited

The Amrita e-learning network is one such system. There are a number of solutions to the delay problem like exploring functional parallelism in the MPEG-4 algorithm and spatio-temporal parallelism. A more interesting solution is to decompose the video sequence into GOPs (Groups of Pictures), and then a dedicated processor independently processes every GOP. The basic idea for data distribution is to arrange the uncompressed video sequence in GOPS. Then, we decide (a) how processors get the GOPS, and (b) which GOPS correspond... ... middle of paper ... ...ng network traffic thus overcoming the network delay.

- 1490 Words
- 3 Pages
- 8 Works Cited

Powerful Essays - Good Essays
## Multivariable Expressions

- 854 Words
- 2 Pages

Pipelining operations that can work on the partial output of preceding operations is another technique that exploits parallel processing opportunities [Smith and Chang 1975; Yao 1979]. For example, restriction and projection can be pipelined so that only a relatively small buffer for data exchange is required rather than the creation and subsequent reading of a possibly large temporary relation. Aspects of simultaneous evaluation and pipelining are combined in the so-called "feedback" method [Clausen 1980; Rothnie 1974, 1975], which uses partial results of a join operation in order to restrict its input. The degree to which this can be done depends on... ... middle of paper ... ...th semijoins that achieve very strong reduction [Chiu and Ho 1980; Chiu et al. 1981].

- 854 Words
- 2 Pages

Good Essays - Good Essays
## Image Segmentation: The Techniques Of Image Segmentation

- 1531 Words
- 4 Pages

Dataset of micro-CT images are used. De-noising filter is used to remove noise from image as a pre-processing step, Feature extraction is performed next, and then Back Propagation Neural Network is created, and lastly, it modifies the weight number of network, and save the output. Proposed algorithm is compared with Thresholding method and Region Growing method. Results have shown that proposed technique outperforms other methods on the basis of speed and accuracy of

- 1531 Words
- 4 Pages

Good Essays