Showing posts from June 2, 2010

How to extract biggest text block from an HTML page ?

One of the interesting problems in handling html content is trying to auto-detect biggest html block from the center of the page. This can be very useful for on-the-fly content analysis done on the browser. Here is an example of how it could be done by parsing the dom after page is rendered.   // Royans K Tharakan (2010 June) // // You are free in any form to use as long as you give credit where its due // Would appretiate if you submit your changes/improvement back to me or to some other public forum. // Requires jquery var largestId = 0; var largestDiv = null; var largestSize = -1; function getLargestDiv() {     var size = getSize(document.getElementsByTagName("body")[0], 0);     if (window.location.href.indexOf("")>0){         return "#bodyContent";     }     return "[d_id='tmp_" + largestId+"']"; } function getSize(currentElement, depth) {