The breast cancer is a severe disease found among females all over the world. This is a type of cancer disease arising from human breast tissue cells, usually from the lobules or the inner lining of the milk ducts that provide the ducts with milk. A recent medical survey reveals that throughout the world breast cancer occurs in 22.9% of all cancers in women and it also causes 13.7% of cancer deaths in them. Breast cancer can be very harmful to all women around the world because it can lead to the loss of a breast or can even be fatal. Diagnosis of breast cancer disease is an important area of data mining research. Classification as an essential data mining process also helps in clinical diagnosis and analysis of this disease. In our work, different classification techniques are applied to the benchmark Breast Cancer Wisconsin dataset from the UCI machine language repository for detection of breast cancer. Principal component analysis (PCA) technique has been used to reduce the dimension of the dataset. Our objectives is to diagnose and analyze breast cancer disease with the help of two well-known classifiers, namely, MLP Backpropagation NN (MLP BPN) and Support Vector Machine (SVM) and, therefore assess their performance in terms of different performance measures like Precision, Recall, F-Measure, ROC Area etc.
Data is considered to be the core element in this era of technological advancement and information science. Vast amounts of data have been collected periodically for operational purposes in business, administration, banking, medical science, environmental protection, security and in politics. Such data sets are huge and complex as well. Basically we require robust, simple and computationally efficient tools to extract info...
... middle of paper ...
... The Turkish Journal of Electrical Engineering & Computer Sciences Volume 21, Issue 1 (2013).
[9] D. Nauck, F. Klawonn, and R. Kruse, Foundations of Neuro fuzzy Systems. Wiley, Chichester (1997).
[10] D. Venet, J. E. Dumont, V. Detours, Most random gene expression signatures are significantly associated with breast cancer outcome, PLoS Comput. Biol. 7, e1002240 (2011).
[11] D. Hanahan, R. A. Weinberg, Hallmarks of cancer: The next generation. Cell 144, 646–674 (2011).
[12] J. Han, M. Kamber, Data Mining: Concepts & Techniques, 2nd Ed., Morgan & Kaufmann (2005).
[13] R. Rojas: Neural Networks A Systematic Introduction, Springer-Verlag, Berlin (1996).
[14] Corinna C and Vapnik V. Support-Vector Networks. Machine Learning, Volume 20, Issue 3, pp. 273-297; 1995.
[15] Breast Cancer Wisconsin (Original) dataset, UCI machine language repository (1992).
The concept of tumor heterogeneity being related to the course of the disease and clinical outcome in cancer patients draws additional attention in the era of personalized medicine (1). Current cancer treatment strategies are based on the site of origin of the primary tumor. However, it was shown that tumors developed from distinct cell types differ in their prognosis and response to cytotoxic therapies (2...
... middle of paper ... ... In Intelligent Data Engineering and Automated Learning–IDEAL 2006 (pp. 1346-1357. Springer Berlin, Heidelberg.
Cancer.gov. (2014). Comprehensive Cancer Information - National Cancer Institute. [online] Retrieved from: http://www.cancer.gov/ [Accessed: 7 Apr 2014].
Encyclopaedia of Molecular Cell biology and molecular medicine, Robert Meyers, 2004, Wiley (page 221/426/385/416/237/ 2224/5321/5414/8869)
Kandel, E. R., J. H. Schwarz, and T. M. Jessel. Principles of Neural Science. 3rd ed. Elsevier. New York: 1991.
Stergiou, C., & Siganos, D. (2011, August 6). Neural Networks. Retrieved August 6, 2011, from
Signatures 2, 4, 5, 13, and 16 showed significant contributions. Signature 4 is classified by C > A base mutations, and was found to likely be the consequence of misreplication of DNA damage from carcinogens. Signatures 2 and 13 are made up by C>T and C>G mutations, but they were only shown more in smokers than nonsmokers in lung cancer, while still being present in the other cancer types, unrelated to tobacco smoking. Signature 5 is characterized by mutations across all 96 subtypes of base substitution and is found in all cancer types. It occurs widespread in nonsmokers and in cancers unrelated to smoking; therefore, it can be concluded that it is probably not a direct consequence of misreplication of DNA damaged from carcinogens. Signature 16 is characterized by T>C mutations and was only increasingly detected in liver cancer for smokers versus nonsmokers, but its mechanism is
Panno, J. (2005). Cancer: The Role of Genes, Lifestyle, and Environment. Facts on File Science Library: The New Biology. Facts On File.
“One in every ten women in the United States will develop breast cancer sometime during her life”. (Breast Care). More than six percent of these cases are linked to hereditary. There are many measures that can be taken to detect breast cancer early in its stages. Women who believe they have a higher risk should have the breast cancer gene testing.
Breast cancer is a disease that is frightening and can be harmful to many people, however as more time passes more is known. The thirty percent decrease in death due to breast cancer is tremendous and it is just a start. By understanding the prevention, treatment, symptoms breast cancer is a disease that can be beat.
Elk, Ronit and Monica Morrow. “Causing the Mutation: Genetics or Environment?” 2003. Breast Cancer for Dummies. Hoboken, NJ: Wiley Publishing, Inc., 2003. 45-46. Print.
National Cancer Institute. 2 December 2013. April 2014. WHO. World Health Organization.
HAND, D. J., MANNILA, H., & SMYTH, P. (2001).Principles of data mining. Cambridge, Mass, MIT Press.
Artificial neural networks are systems implemented on computer systems as specialized hardware or sophisticated software that loosely model the learning and remembering functions of the human brain. They are an attempt to simulate the multiple layers of processing elements in the brain, called neurons. These elements are implemented in such a way so that the layers can learn from prior experience and remember their outputs. In this way, the system can learn to recognize certain patterns and situations and apply these to certain priorities and output appropriate results. These types of neural networks can be used in many important situations such as priority in an emergency room, for financial assistance, and any type of pattern recognition such as handwritten or text-to-speech recognition.
Machine learning systems can be categorized according to many different criteria. We will discuss three criteria: Classification on the basis of the underlying learning strategies used, Classification on the basis of the representation of knowledge or skill acquired by the learner and Classification in terms of the application domain of the performance system for which knowledge is acquired.