java - Derive CSS selector based on Element instance -


background

many questions ask how obtain particular dom element given css selector. question opposite direction. document parsed jsoup, converted of:

use case

for particular problem domain (e.g., chemical compounds), thousands of web pages list chemicals in similar ways, mark-up differs across web sites. example:

<div id="chemical-list">   <div class="compound">     <span class="compound-name">water</span>     <span class="compound-periodic">h2o</span>   </div>   <div class="compound">     <span class="compound-name">sodium hypochlorite</span>     <span class="compound-periodic">naclo</span>   </div> </div> 

another site might list them differently:

<ul class="chemical-compound">   <li class="chem-name">water, h2o</li>   <li class="chem-name">sodium hypochlorite, naclo</li> </ul> 

yet site might, again, use different markup:

<table border="0" cellpadding="0" cellspacing="0">   <tbody>     <tr><td>water</td><td>h2o</td></tr>     <tr><td>sodium hypochlorite</td><td>naclo</td></tr>   </tbody> </table> 

a few sample pages each of thousands of sites downloaded. then, using existing list of chemicals, relatively simple retrieve list of candidate web page elements. using jsoup, simple as:

  elements elements = chemicals.getelementsmatchingowntext( chemicalnames ); 

this allow high-precision analysis across thousands of pages. (the page can discuss applications water , sodium hypochlorite, list being analyzed.) knowing css simplify analysis , increase accuracy.

the alternative process entire page looking "groups" of chemicals, try extract list. both problems difficult, using css selector jump exact spot in page far more efficient, , far more accurate. both problems require hand-crafting, i'd automate away as possible.

problem

the aforementioned apis not appear have methods generate css selector given element instance (the more unique better). possible iterate through parent elements , generate selector manually. has been demonstrated using javascript in few questions. there answers generating xpath, , might possible using selenium.

specifically, how like:

string selector = element.getcsspath(); elements elements = document.select( selector ); 

this would:

  1. return css selector given element.
  2. search document given css selector.
  3. return list of elements match selector.

the second line not issue; first line problematic.

question

what api can generate css selector (as unique possible) dom element?

if there no existing api, nice know.

just use java's actual javascript engine , run plain javascript?

function getselector(element) {   var selector = element.id;    // if have id, that's need. ids unique. end.   if(selector.id) { return "#" + selector; }    selector = [];   var cl;   while(element.parentnode) {     cl = element.getattribute("class");     cl = cl ? "." + cl.trim().replace(/ +/g,'.') : '';     selector.push(element.localname + cl);     element = element.parentnode;   }   return selector.reverse().join(' '); } 

and let's verify against

<div class="main">   <ul class=" list of things">     <li><a href="moo" class="link">lol</a></li>   </ul> </div> 

with

var = document.queryselector("a"); console.log(getselector(a)); 

http://jsfiddle.net/c8k6lxtj/ -- result: html body div.main ul.list.of.things li a.link... gold.


Comments

Popular posts from this blog

php - Submit Form Data without Reloading page -

linux - Rails running on virtual machine in Windows -

php - $params->set Array between square bracket -