Member Login
| Web Crawling |
|
| Sunday, 17 July 2011 00:30 | |
|
Web Crawling is one of the processes to which a successful penetration test and or attack takes place. This is done with various tools. In this example we will discuss, wget, sam spade, teleport pro.
For protection services against web crawling see: Web Crawling Protection Web Crawling:Web crawling is an essential part of penetration testing. Coupled with gathering information about the entire web location there also lays the art of gathering documents you may not be able to find through traditional methods. Tools like samspade can actually grab "interesting information" right from the scan and save the information for you. You can also combine tools like wget with grep to pilfer information from the files downloaded. What is worse? The fact that when downloading files you may stumble upon links that may be hidden, or files that should not be found. Unix Web Crawling:For the next few segments we will take a look at the, Linux portions especially with the tools of wget, and grep. Most web sites may employ some tactics to disable web crawling. If you notice you're getting the following errors:
If you are experiencing the 403 forbidden error; this could indicate that the server is NOT allowing web crawlers and is filtering requests / conditions based upon user agent settings. In order to bypass this restriction; enter the following text: wget -r --user-agent FireFox weblocation.com what this will do is download the details of the web location; and bypass the restrictions implemented upon web crawling tactics. To download additional information (and grep for "interesting information" in one location) you can issue the following syntax: wget -r --user-agent weblocation.com -o webcrawl.log Another tactic that wget can employ is the usage of getting external URL files from a log. In order to accomplish this task, simply include the following syntax in wget: wget --user-agent foo --input-file foo Your input file should contain the following text one under another: http://site.com/file.ext Using Grep:Grep if used correctly is a very powerful tool. In addition to which, we will also display some minor strings you can search for to gain information in regards to helping you along your way into a penetration test. The grep -i switch will ignore case sensitivity, -r for directory recursion (used for when wget saves folders for each nestled file) -A which will print trailing details foo times after a condition is found, and -b which will provide the same as -A however traversing it's predecessor. Provided with the following: grep -i foo -b 3 -A 4 -r will provide a grep that will ignore the case for foo, matching condition of FOO, foo, and other variations; print the output of the 3 lines before condition foo, print the 4 lines after, condition foo while running in recursive mode for each directory found in the listing. Search Terms:As you may have guessed, there are certain search terms to which you can search to help make your dissection of the web application a bit easier for yourself. Such terms can be as small as comments, to passwords, e-mail addresses, phone numbers, etc. Here is a small listing of what to search for:One can also test for robots, readme, and other default files that come bundled with various web applications, and COTS. Don't forget to experiment! Windows Web Crawling - Sam Spade:Sam Spade is quite the character. And, he's got a few tricks up his sleeve. This section here will ONLY deal with the web crawling portion of same spade, and how you can harness the power and options built into sam spade with similar results as to that of the wget and grep functions in, Linux... Did we mention wget is also available for windows? Anyway, after you've installed Sam Spade, load it up and follow the depictions in this graphic:
Once the option has been selected; you'll also be greeted with another option set. This option set allows you to configure the crawling mechanism for Sam Spade, and allows you to specify and fine tune what it is that you will be searching, obtaining and looking for. Below is an example of the option set.
The next sections below will explain each tag in red; and what the options will provide you with. It's quite important to understand the option sets that you will be working with for specific tools. As if certain options are left off, you can and most likely will skip over important data. Option Set 1: This will set the target location and or web address to which you will be searching in, or on. Option Set 2: Sets additional files and URLs to check while searching (as performed with the wget tool in, Linux or windows). Option Set 3: Only fetches files of web content. This however, can omit information such as text files [different extensions], and other details which may contain additional information to help you in your penetration test, or web assessment. The point of mirroring is to make an exact copy of the original. Option Set 4: These option sets provide you a storage location to where you can save the information you are mirroring. Including headers, and utilizing applications like Camera/Shy may help you unearth steganography within images [The art of hiding encrypted messages in images of .bmp, and other large sized images] Copying these files is essential for the paranoid. Option Set 5: With the option settings provided herein will obtain information from within the HTML, or web files which match the conditions in the checked fields. If needed, you can also search images not stored on the local host [again if you are hunting down graphics which may contain steganography] Links to other servers may provide you with partner web locations or sister-companies which may be vulnerable. The easiest route could possibly be through a neighbor and into the target! E-mail addresses can provide the solid foundation to start social engineering, and utilizing the options in the last stage [6] you can couple in additional details to broaden your searching. Option Set 6: Here we have additional conditions which can be met, and verified. We are only showing the searching for comments; as many times internal comments can lead to exposures. You can also include area codes, and other useful information which can also in turn lead to a full disclosure. The options are limitless so as long as the information you are unearthing exists on the document you are greping. Experiment, and play around. Chances are you may uncover something which may help protect your own web location or organization -- before an attacker can find it. Now that all the details for Sam Spade have been filled out and saved; the web crawl begins. Here is an example of a finished web crawl utilizing sam spade:
Windows Web Crawling - Teleport Pro:Teleport Pro can basically do everything except make you more acceptable to your wife. The application is feature packed; however for these examples we are only covering basic information gathering with the usage of this application. You can also use the windows tool, or write an extension of an application to search the folders where the mirror was saved to return additional details. Here is the basic gist of getting the application started and off to do what it does best. Mirror!
|





