WEB mining

You are not Member of this Project.
Project Owner : Shyam.C
Created Date : Sat, 10/03/2012 - 21:51
Project Description :

 Web mining - is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage miningWeb content mining and Web structure mining.


With  the recent explosive growth of  t h e  amount of content on t h e  Internet, 
it has become increasingly difficult for users t o  find and utilize information 
and for  content  providers t o  classify  and  catalog documents. Traditional 
web  search  engines  often  return  hundreds or thousands  of  results  for  a 
search, which  is  time  consuming  for  users  t o  browse. On-line  libraries, 
search engines, and other large document repositories ( e . g .   customer sup- 
port databases, product specification databases, press release archives, news 
story archives, e t c . )   a r e  growing so rapidly t h a t  it is difficult and costly t o  
categorize every document manually.  In order t o  deal with these problems, 
researchers  look  toward  automated  methods  of  working  with  web  docu- 
ments so t h a t  they can be more easily browsed,  organized, and cataloged 
with minimal human intervention

Web usage mining

Web usage mining is the process of extracting useful information from server logs i.e users history. Web usage mining is the process of finding out what users are looking for on theInternet. Some users might be looking at only textual data, whereas some others might be interested in multimedia data.

Web structure mining

Web structure mining is the process of using graph theory to analyze the node and connection structure of a web site. According to the type of web structural data, web structure mining can be divided into two kinds:

1. Extracting patterns from hyperlinks in the web: a hyperlink is a structural component that connects the web page to a different location.

2. Mining the document structure: analysis of the tree-like structure of page structures to describe HTML or XML tag usage.

Web content mining

mining, extraction and integration of useful data, information and knowledge from Web page contents.

Web Usage mining Pros and Cons


Web usage mining essentially has many advantages which makes this technology attractive to corporations including the government agencies. This technology has enabled e-commerce to do personalized marketing, which eventually results in higher trade volumes. The government agencies are using this technology to classify threats and fight against terrorism. The predicting capability of the mining application can benefits the society by identifying criminal activities. The companies can establish better customer relationship by giving them exactly what they need. Companies can understand the needs of the customer better and they can react to customer needs faster. The companies can find, attract and retain customers; they can save on production costs by utilizing the acquired insight of customer requirements. They can increase profitability by target pricing based on the profiles created. They can even find the customer who might default to a competitor the company will try to retain the customer by providing promotional offers to the specific customer, thus reducing the risk of losing a customer or customers.


Web usage mining, itself, doesn’t create issues, but this technology when used on data of personal nature might cause concerns. The most criticized ethical issue involving web usage mining is the invasion of privacy. Privacy is considered lost when information concerning an individual is obtained, used, or disseminated, especially if this occurs without their knowledge or consent. The obtained data will be analyzed, and clustered to form profiles; the data will be made anonymous before clustering so that there are no personal profiles. Thus these applications de-individualize the users by judging them by their mouse clicks. De-individualization, can be defined as a tendency of judging and treating people on the basis of group characteristics instead of on their own individual characteristics and merits.
Another important concern is that the companies collecting the data for a specific purpose might use the data for a totally different purpose, and this essentially violates the user’s interests. The growing trend of selling personal data as a commodity encourages website owners to trade personal data obtained from their site. This trend has increased the amount of data being captured and traded increasing the likeliness of one’s privacy being invaded. The companies which buy the data are obliged make it anonymous and these companies are considered authors of any specific release of mining patterns. They are legally responsible for the contents of the release; any inaccuracies in the release will result in serious lawsuits, but there is no law preventing them from trading the data.

You are not authorized to access this content.
You are not authorized to access this content.
You are not authorized to access this content.
You are not authorized to access this content.
You are not authorized to access this content.