Sunday, September 6, 2009

BiLE - finding out relationships

BiLE is a Bi-directional Link Extractor, a tool suite of Perl scripts create by Sensepost. It uses HTTrack and Google to give you a view on what websites have a strong relationship with the website of your target.

The first interesting script is is called BiLE.pl when you run it against a target website it starts HTTrack to get the target website and all websites to which it can find hyperlinks. BiLE will also query Google using the "link:" directive. Using this Google hack it can find all websites linking to the target website.

BiLE.pl produces 2 output files. The first one is a .mine file the other one is a .walrus file. If you have a look at the .mine file you'll see that the output is of the form source:destination.

Here is a sample of the output when I tested it:
www.target.org:jaxb.dev.java.net
www.target.org:jbind.sourceforge.net
www.target.org:jigsaw.w3.org
www.target.org:lists.w3.org
www.target.org:lists.xml.org
www.target.org:lucas.ucs.ed.ac.uk

This file only tells you that there is a link from your target website to a destination website. So there is a relationship between target and destination but you can't tell how important it is. This is why you have the script BiLE-weigh.pl.

BiLE-weigh.pl uses the output file of BiLE.pl and uses a weighing algorithm to determine the importance of the relationships between the target and the destinations. In the readme is a little description how it works.

To get the BiLE-weigh.pl up and running I had to alter the code since I got the error "BiLE-weigh.pl gives sort: open failed: +1: No such file or directory – error".

Change this line from:
`cat temp | sort -r -t “:” +1 -n > @ARGV[1].sorted`;
to:
`cat temp | sort -r -t “:” -k 1 -n > @ARGV[1].sorted`;

I found the solution on the minimalistic transparent x-desktop blog.

The output of BiLE-weigh.pl is something like this:
www.somesite.com:6.6
www.anothersite.com:4.02439024390244
subdomain.yetanothersite.com:75

The value at the end is the weight. It is a meaningless value, we are only interested in the rate of decay. To get this done in a reasonable easy way, you copy the content of the .sorted file (This is the output file of BiLE-weigh) and paste it into a spreadsheet. In OpenOffice Calc a wizard pops up asks you how it should handle the data. Your delimiter is a semicolon (:). Once you got the data in your spreadsheet the last action is to sort it by the weight descending.

Now you have a nice little list that tells you what relationships exist between your target website and other websites.

My output was:
www.target.com: 298.62
sub1.target.com: 165
target.wordpress.com: 165
tools.emailgarage.com: 75
www.mapsonline.be: 75

The next website has a weight of 6.6, so it drops dramatically and therefore you can assume that the interesting part stops here.

So these 5 lines of output will allow you to assume that the target organization has real life relations with wordpress.com, emailgarage.com and mapsonline.be

Don't toss away the offline copies you have now from your targets website and the website which have a relationship with it because source code analysis can may be tell us more about their systems.

3 comments:

fs111 said...

You win the "useless use of cat award" ;-) Almost all unix tools can read files on their own so the
"cat foo.txt | someothercommand"
can be
"someothercommand foo.txt"

Erik Vanderhasselt said...

Indeed, but it is not my solution and I think if you refer to somebody's solution you should not alter it. But you are absolutely right.

Guilherme Ferreira said...

Could you post a howto install and configure BiLE, because when I run ./bile-public-ext.pl it shows a line "##Link to a site", but don't return nothing.

Thanks.

Wilfer9
willferr@gmail.com