java - Derive CSS selector based on Element instance -
background
many questions ask how obtain particular dom element given css selector. question opposite direction. document parsed jsoup, converted of:
use case
for particular problem domain (e.g., chemical compounds), thousands of web pages list chemicals in similar ways, mark-up differs across web sites. example:
<div id="chemical-list"> <div class="compound"> <span class="compound-name">water</span> <span class="compound-periodic">h2o</span> </div> <div class="compound"> <span class="compound-name">sodium hypochlorite</span> <span class="compound-periodic">naclo</span> </div> </div>
another site might list them differently:
<ul class="chemical-compound"> <li class="chem-name">water, h2o</li> <li class="chem-name">sodium hypochlorite, naclo</li> </ul>
yet site might, again, use different markup:
<table border="0" cellpadding="0" cellspacing="0"> <tbody> <tr><td>water</td><td>h2o</td></tr> <tr><td>sodium hypochlorite</td><td>naclo</td></tr> </tbody> </table>
a few sample pages each of thousands of sites downloaded. then, using existing list of chemicals, relatively simple retrieve list of candidate web page elements. using jsoup, simple as:
elements elements = chemicals.getelementsmatchingowntext( chemicalnames );
this allow high-precision analysis across thousands of pages. (the page can discuss applications water , sodium hypochlorite, list being analyzed.) knowing css simplify analysis , increase accuracy.
the alternative process entire page looking "groups" of chemicals, try extract list. both problems difficult, using css selector jump exact spot in page far more efficient, , far more accurate. both problems require hand-crafting, i'd automate away as possible.
problem
the aforementioned apis not appear have methods generate css selector given element instance (the more unique better). possible iterate through parent elements , generate selector manually. has been demonstrated using javascript in few questions. there answers generating xpath, , might possible using selenium.
specifically, how like:
string selector = element.getcsspath(); elements elements = document.select( selector );
this would:
- return css selector given element.
- search document given css selector.
- return list of elements match selector.
the second line not issue; first line problematic.
question
what api can generate css selector (as unique possible) dom element?
if there no existing api, nice know.
just use java's actual javascript engine , run plain javascript?
function getselector(element) { var selector = element.id; // if have id, that's need. ids unique. end. if(selector.id) { return "#" + selector; } selector = []; var cl; while(element.parentnode) { cl = element.getattribute("class"); cl = cl ? "." + cl.trim().replace(/ +/g,'.') : ''; selector.push(element.localname + cl); element = element.parentnode; } return selector.reverse().join(' '); }
and let's verify against
<div class="main"> <ul class=" list of things"> <li><a href="moo" class="link">lol</a></li> </ul> </div>
with
var = document.queryselector("a"); console.log(getselector(a));
http://jsfiddle.net/c8k6lxtj/ -- result: html body div.main ul.list.of.things li a.link
... gold.
Comments
Post a Comment