Deepak's World: July 2004

Saturday, July 31, 2004

Agent-based Intelligent Reactive Environments

aire is dedicated to examining how to design pervasive computing systems and applications for people. To study this, aire designs and constructs Intelligent Environments (IEs), which are spaces augmented with basic perceptual sensing, speech recognition, and distributed agent logic.
aire's IEs have encompassed a large range of form factors and sizes, from a pocket-sized computer up to networks of conference rooms. Each of these serve as individual platforms, or airespaces on which pervasive computing applications can be layered. Examples of aire applications currently under development include a meeting manager and capture application, contextual and natural language information retrieval, and a sketch interpretation system.

Further Reading suggestion www.ai.mit.edu

Friday, July 30, 2004

Stanford Stream Data Manager

In applications such as network monitoring, telecommunications data management, web personalization, manufacturing, sensor networks, and others, data takes the form of continuous data streams rather than finite stored data sets, and clients require long-running continuous queries as opposed to one-time queries. Traditional database systems and data processing algorithms are ill-equipped to handle complex and numerous continuous queries over data streams, and many aspects of data management and processing need to be reconsidered in their presence. In the STREAM project, we are reinvestigating data management and query processing in the presence of multiple, continuous, rapid, time-varying data streams. We are attacking problems ranging from basic theory results to algorithms to implementing a comprehensive prototype data stream management system

Peer - to -Peer

Peer-to-peer (P2P) systems have become a popular medium to share huge amounts of data. P2P systems distribute the main costs of sharing data - disk space for storing files and bandwidth for transferring them - across the peers in the network, thus enabling applications to scale without the need for powerful, expensive servers. Their ability to build a resource-rich system by aggregating resources enables them to dwarf the capabilities of many centralized systems for little cost.
There are, however, important challenges that must be overcome before the full potential of P2P systems can be realized. For example, the scale of the network and the autonomy of nodes make it difficult to identify, model and distribute resources that are available. Furthermore, some nodes may be malicious which makes it difficult to provide peers with authentic information or prevent denial-of-service attacks. These issues, and others, have motivated our research on understanding and improving P2P systems.

Thursday, July 29, 2004

Intelligent Crawling

Efficient Crawler Should have its knowledge bases, crawling algorithm, and analysis of crawler learning ability. The knowledge bases of the crawler are incrementally built from the log of previous crawling. For efficient result of the next crawling, we present three knowledge bases: starting URLs, topic keywords, and URL prediction. Good starting URLs support the crawler to collect as many relevant web pages as possible. Good topic keywords support the crawler to recognize appropriate keywords matching the required topic. Good URL prediction supports the crawler to predict the relevancy of the content of unvisited URLs. Crawling algorithm has been separated into two parts: crawling with no knowledge bases and crawling with knowledge bases. Crawling with no knowledge bases is used in the first crawling for Internet exploration. The information gathered from the first crawling has been accumulated to be the experience of the crawler, i.e. the knowledge bases, during the consecutive crawling. Crawling with knowledge bases should be used in the next crawling for more efficient result and better network bandwidth utilization. - Weblink Team

Wednesday, July 28, 2004

What is PageRank (Explained)

In short PageRank is a “vote”, by all the other pages on the Web, about how important a page is. A link to a page counts as a vote of support. If there’s no link there’s no support (but it’s an abstention from voting rather than a vote against the page).Quoting from the original Google paper, PageRank is defined like this:We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.but that’s not too helpful so let’s break it down into sections.1. PR(Tn) - Each page has a notion of its own self-importance. That’s “PR(T1)” for the first page in the web all the way up to “PR(Tn)” for the last page2. C(Tn) - Each page spreads its vote out evenly amongst all of it’s outgoing links. The count, or number, of outgoing links for page 1 is “C(T1)”, “C(Tn)” for page n, and so on for all pages.3. PR(Tn)/C(Tn) - so if our page (page A) has a backlink from page “n” the share of the vote page A will get is “PR(Tn)/C(Tn)”4. d(... - All these fractions of votes are added together but, to stop the other pages having too much influence, this total vote is “damped down” by multiplying it by 0.85 (the factor “d”)5. (1 - d) - The (1 – d) bit at the beginning is a bit of probability math magic so the “sum of all web pages' PageRanks will be one”: it adds in the bit lost by the d(.... It also means that if a page has no links to it (no backlinks) even then it will still get a small PR of 0.15 (i.e. 1 – 0.85). (Aside: the Google paper says “the sum of all pages” but they mean the “the normalised sum” – otherwise known as “the average” to you and me.
Further Readings The PageRank Vector - Deepak Thukral and Ashish Juneja