The aim of this paper is to familiarize students with the process mining (OL) of datasets. For the purposes of the work will use the tools presented in the tutorial part of the course. Specifically, to perform data mining algorithms you can use Analysis Services in Microsoft SQL Server, WEKA or Oracle Data Miner. You will work in a well-known dataset from the UCI ML Repository (URL: [login to view URL]). The data set will bother are the following:
KDD Cup 1999 Data Set: This data set (dataset) includes a wide variety of network intrusions (network intrusion) that were created after the simulation. The aim is to create a network intrusion detector. Specifically, a predictive model capable of distinguishing the 'bad' connections, called intrusions or attacks, the "good" normal connections. Detailed description of the dataset can be found at the address below from where you can also download.
https://archive.ics.uci.edu/ml/datasets/KDD+Cup+1999+Data
* Use a file that contains 10% of total observations - [login to view URL])
Suppose we want to analyze the above data set. Initially we face the problem of data preparation (data preprocessing) to keep you interested for our analysis purposes. In the second phase we perform various functions OL (classification, clustering, feature selection, association rules, outlier detection, etc..) Having predefined the purpose of our analysis. This process may be repeated several times until the result of the analysis is satisfactory and our knowledge of the data is sufficient, and thus directly useful (actionable) by the responsible decisions of individual applications.
Below are the tasks that you must perform to analyze the dataset you have selected:
first work (data preparation): From the above dataset you choose which data to use for analytical processing, and will proceed in any preparatory work (selection, purification, transformation, sampling, etc..) Consider necessary in order to: a) experience p ax. the issue of problematic data (missing or wrong - irrational - values), the
1
continuous ranges, etc., b) to improve the performance of the algorithms will use the following operations. Thus, for each task ED following should present the rationale preparation (issues, confronted the preparation of data and description of the procedure followed) and its effect regardless of whether it turned bad choice and therefore not retained in the application OL of the final model. Note that for each of the following tasks can OL / (probably) should be followed and different / appropriate data preparation. ranges etc..
2nd job (feature selection): Although often part of data preparation, we sought to find separately and exhaustively describe scenarios that highlight what features are important to each of the following tasks OL. How / how changing performance / precision techniques as opposed to improvement in the execution speed of the algorithms?
3rd job (classification / prediction): Consider a scenario classification and prediction (not just what we usually used the dataset) that might be of interest to analysts of each application. Do not only use techniques taught in theory, but other than that supplied the tool with which to work. Compare the different approaches, without using the automatic technique of comparing algorithms using default configuration algorithms. The aim is to maximize the performance of each algorithm separately "playing" with the preparation of the dataset and its parameters. Describe the process and explain the results obtained. How could exploit a government agency such knowledge?
fourth task (clustering): We will repeat the process, this time clustering (clustering).
fifth task (association rule mining): You will repeat the process, this time to analyze correlations.
I am a data scientist/quantitative developer with over 5 years of experience doing analysis.
My interest and expertise in using R , Matlab ,weka and Database Development.
Dear Hiring Manager,
We can do this job efficiently and effectively
Have a great day to you, I am interested to do this job, I have over 15 years experience in the field of web research, web scraping, market research, email collection, data-entry in excel, social media research, Facebook marketing and as admin/ virtual assistant. And I am a computer and web savvy, my goals are to make the employer/ customer happy and give better quality of work and I am ready to dedicate time for this project. Thank you
Kind regards
Heric. P.
Note: I am quick, detail oriented and have expert team members with me to do this job and have high speed internet connection and I look forward to hearing from you to start this job soon. Thanks
I have reviewed your problem thoroughly. this work is some tedious but I like to do data mining for new projects.
You can trust me, i will deliver you all desired entries within given time limit.
thank you
My final year project is related to data mining. The description of the project is selection of informatic genes using data mining method in R package. I familiar with the data imputation, feature selection, gene selection, classification, and aslo survival analysis. If i fullfil your requirements, can email me. Thanks