1) Title : Study on Data Mining and Big Data
• Methodology: Algorithms
• Description: Data Mining contains of several algorithms that fall into four different categories(Shobana et al. 2015)
Association Rule
Clustering
Classification
Regression Association algorithms are used to search for relationships between variables. It is applied when searching for frequently visited Items. In short association algorithms establish relationships among objects
Clustering algorithms are used to discover structures and groups in the data, e.g. it classifies the data belongs to which group
Classification algorithms deal with associating unknown structures to known structures
Regression algorithms finds functions to model the data.(Shobana et al.
…show more content…
2012)
3) Title: Big Data analytics in healthcare: promise and potential
• Methodology: Questionnaire(Groves et al.
…show more content…
2014)
• Methodology: Modelling, Algorithms
• Description: Association algorithms in data mining are used to for relationships between variables. It is applied when searching for frequently visited items
Association algorithms and predictive modelling can analyse the buying habits of pregnant women and identify products that were used as indicators that a person is pregnant(Mayer-Schönberger & Cukier. 2014). Retail Companies can use that information and market those products to pregnant women.
5) Title Critical Questions for Big Data(Boyd & Crawford 2012)
• Methodology: Focus groups, Case Study(Boyd & Crawford
The K-Means algorithm is used for cluster analysis by dividing data points into k clusters. The K means algorithm will group the data into the cluster based on feature similarity.
Over the past few decades, the generation and availability of information over the cyberspace is increasing enormously. There exist an alarming need for solutions that will help to filter the relevant data from the collection of disorganised data for the users to select the most suitable data from the available collection of data. A lot of strategies have been developed, that assist in the selection of relevant information for the user. Applications on the internet are making searching convenient for users by incorporating recommender systems within the applications which helps to filter unwanted information, predict the needs and preferences of users (Long, Zhang, & Hu, 2011) and provide suggestions to the users. When compared to the other fields of information systems, recommender systems is a relatively new field, as it initially used to be a part of information retrieval and management sciences.
...o cut. The brief idea is clustering is done around half data through Hierarchical clustering and succeed by K-means for the remaining. In order to create super-rules, Hierarchical is terminated when it generates the largest number of clusters.
This chapter discusses Page Rank Algorithm essential ideas and analyzes its computational formula and then mentions some problems related to the algorithm. With the rapid development of world –wide web, the users face the problem of retrieving useful information from the large number of disordered and scattered information. However, current search engines cannot fully satisfy the user’s need of high-quality information search services but the most classic web structure mining algorithm is Page Rank Algorithm. The Page Rank algorithm is based on the concepts that if a page contains important links towards it then the links of this page towards the other page are also to be considered as important pages. Page Rank algorithm calculates the importance of web pages using the link structure of the web pages. This approach explores the idea of simply counting in-links equally, by normalizing the number of links on a page when distributing rank scores. Therefore, Page Rank (i.e. a numeric value that represents how important a page is on the web) takes the back links into account and propagates the ranking through links: a page has a high rank if the sum of the ranks of its back links (in links) is high. It (Page Rank Algorithm) is one of the methods that Google (famous search engine) uses to determine the importance or relevance of a web page.
It's a well known fact that humans have the ability to effeciently recognize patterns. Some people who work for Google, have highlighted the fact that backliinks, keywords, title tags and meta descritpoions are greate factors which can be utulied to sort and rank websites. However, the concept of recognizing such patterns on a massive scope is something that humans cannot easily do. Machines on the other hand, are extremly effeeint at gathering data. However, unlike humans they cannot recognize patterns as easily in terms of how certain patterns fit into the overal big picture as well as to understand what that pictures mean.
Classification is a supervised leaning process where the data is grouped against a known class tag. It is a task consists of discovering knowledge that can be used to forecast the class of a record whose class identify is unknown. In mammogram image classification it is used to categorize the images under different class tags depending on the characteristics of image. Classification is discrete and do not entail any order and continuous and floating point would designate a numerical target rather than categorical.
In the development of web search Link analysis the analysis of hyperlinks and the graph structure of the Web have been helpful which is one of the factors considered by web search engines in computing a composite rank for a web page on any given user query. The directed graph configuration is known as web graph. There are several algorithms based on link analysis. The important algorithms are Hypertext Induced Topic Search (HITS), Page Rank, Weighted Page Rank, and Weighted Page Content Rank.
To the programming community the algorithms described in this chapter and their methods are known as feature selection algorithms. This theoretical subject has been examined by researchers for decades and a large number of methods have been proposed. The terms attribute and feature are interchangeable and refer to the predictor values throughout this chapter, and for the remainder of the thesis. In this theoretical way of thinking, dimensionality reduction techniques are typically made up of two basic components, [21], [3], [9].
In today’s society, technology has become more advanced than the human’s mind. Companies want to make sure that their information systems stay up-to-date with the rapidly growing technology. It is very important to senior-level executives and board of directions of companies that their systems can produce the right and best information for their company to result in a greater outcome and new organizational capabilities. Big data and data analytics are one of those important factors that contribute to a successful company and their updated software and information systems.
Big Data is a term used to refer to extremely large and complex data sets that have grown beyond the ability to manage and analyse them with traditional data processing tools. However, Big Data contains a lot of valuable information which if extracted successfully, it will help a lot for business, scientific research, to predict the upcoming epidemic and even determining traffic conditions in real time. Therefore, these data must be collected, organized, storage, search, sharing in a different way than usual. In this article, invite you and learn about Big Data, methods people use to exploit it and how it helps our life.
The data is being derived from health-care system, clinical trials, real-time monitoring, and also other sources. For prediction of required patterns machine learning algorithms are used. The new knowledge is discovered which can lead to resistance in knowledge -based systems with the use of casual relationships. The biggest challenge in making the predictive analysis operational is it offers an often wrong prediction and is difficult to attain the correct predictions and also cost for this prediction is high.
The key objective in any data mining activity is to find as many unsuspected relationships between obtained data sets as possible to be able to achieve a better understanding on how the data and its relationships are useful to the data owner. The potential of knowledge discovery using data mining is huge and data mining has been applied in many different knowledge areas such as in large corporations to optimize their marketing strategies or even to smaller scale in medicinal research where data mining is used to find the relationship patient’s data with the corresponding medicinal prescription and symptoms.
Big data is a concept that has been misunderstood therefore I will be writing this paper with the intentions of thoroughly discussing this technological concept and all its dimensions with regard to what constitutes big data and how the term came about. The rapid innovations in Information Technology have brought about the realisation of big data. The concept of big data is complex and has different connotations but I intend to clarify its functions. Big data refers to the concept of a collection of large and complex amounts of data that are found extremely difficult to notate or even process by most on-hand devices and database technologies.
Classification algorithms is the process of a computer relating a subject to a category. To best explain this concept, Stephen Marsland states “…consider a vending machine, where we use a neural network to learn to recognize different coins” (Machine Learning, Section 1.4). The computer learns by analyzing large amounts of data and then categorizing the data. This is how a computer system can identify a certain illness to assist medical staff identify a certain type of illness or disease. In addition, supervised learning can also utilize regression
DENG ZhiHong, WANG ZhongHui and JIANG JiaJian, ’A new algorithm for fast mining frequent itemsets using N-lists, Science China Press and Springer-Verlag, Berlin, Heidelberg, 2012.