User Enumeration Is A Big Deal

I’ve been participating in bug bounties with BugCrowd and one of the first things I check for is username/email enumeration on the login page and the forgotten password page. Some of the companies running the bug bounties explicitly state that they will not pay for user enumeration vulnerabilities. These companies feel username/email enumeration is a low-risk vulnerability because Phishers target large groups of email addresses instead of email addresses associated with a particular target. In addition, they feel it is low-risk because they believe the account lockout policies they have in place will protect them from password attacks. Unfortunately, neither of these assumptions are correct.

Spammers typically send messages indiscriminantly while phishers typically send their messages to a specific set of targets. As an attacker if I can use your login or forgotten password page to narrow my list from 10000 targets to 1000 targets, I will.

While account lockout policies are a good thing and can prevent certain password guessing attacks they can also be worked around with proper timing. Also, depending on how long the account lockout lasts and whether the account must manually be reset, an attacker could easily cause a DoS for your users or your helpdesk personnel.

Finally, an attacker with a large enough set of valid email addresses would only need to try three or four common passwords with each email address to gain access to a significant number of accounts. These three or four failed password attempts will typically not trigger an account lockout.

Username/email enumeration is not the end of the world but it is certainly something that should be fixed and is typically easy to fix. When a user fails to login, don’t tell the user whether the username or the password failed. Simply say the login attempt failed. When a user submits their username/email to the forgotten password form don’t tell them whether the username/email was found or not. Simply tell them that an email is on the way.

UPDATE 6/13/2014:
A couple of people on Twitter pointed out that there will always be at least one username enumeration vulnerability on sites where users self-register. That vulnerability will be in the account creation process. Another user said this vulnerability is not preventable without ruining the user experience. I’m not a UX guru so I have no idea whether this is true or not.

In either case, the danger of username enumeration comes from the fact that an attacker is able to gather one of the two pieces of information needed to login to the site. If we cannot prevent an attacker from getting half of the login information maybe the answer is to require more login information, ie multi-factor authentication. With proper multi-factor authentication you still run the risk of creating a denial of service by locking out accounts but you eliminate the more dangerous vulnerability of user accounts being compromised on a massive scale.

So, I Wrote a Book

Really it’s more of a training manual. Around June or July of last year I decided I wanted to teach an intro level penetration testing class aimed at system administrators. The purpose of the class would be to teach sysadmins how to attack and hopefully better defend their systems. In the months before the class I wrote a detailed training manual covering all of the topics of the class. The training manual included a number of hands on, step-by-step labs to demonstrate the topics covered. Once the class was over, I decided to publish the training manual. After making a number of revisions based on the feedback from the class and feedback from an editor, I finally published the manual.

I would highly recommend this manual for anyone who is interested in learning penetration testing. It provides a nice overview of the common phases of a penetration test and the common vulnerabilities encountered when doing a penetration test. The manual was originally written to be the basis for an in person training event but can be used as a self-study guide as well. You can get a copy of the book for yourself or for the system administrators in your life from lulu.com.

I plan to offer more live training events based on the material in the book and would encourage others who want to do live training to use this book as the basis for their own events. If you use the book as the basis for a class, each student will need to purchase his or her own copy. To find out about my next live training event keep an eye on the training page at the ASG Consulting web site.

The labs in the manual require Kali Linux and Metasploitable2 which are both freely available here and here. I would like to thank Offensive Security and Rapid7 for making these great tools available.

On Robots and Sitemaps

Robots.txt and Sitemap.xml files are extremely useful when managing web crawlers but they can also be dangerous when they give away too much information. So what are these files and how are they used?

Many years ago, web site owners and web crawlers developed a method to allow web site owners to politely ask web crawlers not to crawl certain portions of a site. This is done by defining a robots.txt file (Robots Exclusion Standard). For the most part, this standard is adhered to by web crawlers.

A typical robots.txt file looks like this.

#Google Search Engine Robot
User-agent: Googlebot
Allow: /?_escaped_fragment_

Allow: /search?q=%23
Disallow: /search/realtime
Disallow: /search/users
Disallow: /search/*/grid

Disallow: /*?
Disallow: /*/followers
Disallow: /*/following

Disallow: /account/not_my_account

#Yahoo! Search Engine Robot
User-Agent: Slurp
Allow: /?_escaped_fragment_

Allow: /search?q=%23
Disallow: /search/realtime
Disallow: /search/users
Disallow: /search/*/grid

Disallow: /*?
Disallow: /*/followers
Disallow: /*/following

Disallow: /account/not_my_account

This is a portion of Twitter’s robots.txt file. Notice how Twitter tells the search engines which portions of the site are allowed to be crawled and not allowed to be crawled.

Tonight, while surfing the web I found this robots.txt file. I’ll let you guess which site it goes to.

Sitemap: http://data.healthcare.gov/sitemap-data.healthcare.gov.xml

Notice that there are no Disallow directives. Based on the accepted convention, the web site owner gives all web robots permission to crawl the entire site. Theoretically, you could write your own robot and legally crawl the entire site.

Along with the agreement to use robots.txt files, web site owners and web crawlers also decided to use a sitemap.xml file to explicitly define the structure of the web site and the URLs “on a website that are available for crawling” (Sitemaps). The Sitemap directive can be added to the robots.txt file to tell web crawlers where to find the sitemap file.

If we look at the sitemap for data.healthcare.gov we can see the URLs, which by convention, we are EXPECTED to crawl or visit as users.

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>http://data.healthcare.gov/sitemap-datasets-data.healthcare.gov0.xml</loc>
  </sitemap>
  <sitemap>
    <loc>http://data.healthcare.gov/sitemap-users-data.healthcare.gov0.xml</loc>
  </sitemap>
</sitemapindex>

This sitemap file tells us about two additional sitemap files. The sitemap-users-data.healthcare.gov0.xml file looks interesting.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://data.healthcare.gov/profile/Bill-Fencken/ta6q-868x</loc>
    <lastmod>2010-12-06</lastmod>
  </url>
  <url>
    <loc>http://data.healthcare.gov/profile/Wahid-Saleemi/u63n-yr8b</loc>
    <lastmod>2010-12-06</lastmod>
  </url>
  <url>
    <loc>http://data.healthcare.gov/profile/Debbie-York/45wa-c5qw</loc>
    <lastmod>2010-12-06</lastmod>
  </url>

This sitemap file tells us the profile link for a number of user accounts. In fact, it provides links to approximately 3900 user accounts. Again, based on convention, robots are EXPECTED to visit each of these links and download the page at the link. You can see that Google did exactly this by running this query.

So, let’s download a page. When you visit a page in a browser, the page is downloaded and rendered by the browser. When a robot or search engine downloads the page they read and parse the HTML code that makes up the page. To see this HTML code, you can right-click on the page in your browser and choose View Source, or something similar depending on your browser. The source code looks like this.

healthcare_gov_viewsource

Note, the source code is full of links to other portions of the data.healthcare.gov site, which, by convention, we are allowed to crawl because the robots.txt file does not define any disallowed portions of the site. One such link, /api/users/ta6q-868x?method=contact, is found about 1/3 of the way down the page. Visiting this page produces an error message in JSON format, which means a web crawler like Google will likely ignore this page.

{
  "code" : "method_not_allowed",
  "error" : true,
  "message" : "GET not allowed"
}

On a more serious note, a typical attack against websites includes enumerating user accounts and then attempting to brute-force the associated password. Typically, an attacker has to work to find a method to enumerate user accounts but in this case the sitemap file provides a list of user accounts. Personally, I think it would be wise to remove the sitemap file at http://data.healthcare.gov/sitemap-users-data.healthcare.gov0.xml.

Finding Weak Rails Security Tokens

The other day I was reading about the dangers of having your Rails secret token in your version control system. The TL;DR version is secret tokens are used to calculate the HMAC of the session data in the cookie. If you know the secret token you can send arbitrary session data and execute arbitrary code.

So I decided I’d go digging through Github to see if anyone had uploaded secret tokens to the site. Sure enough, there were more than a few secret tokens. This isn’t all bad because Rails allows different configuration settings in the same application depending on whether the app is in production or development and most of the Rails apps used a strong secret_token read from an environment variable or generated by SecureRandom for the production site but a weak secret_token for development the site.

I took a few minutes to record the secret tokens I found and decided to see if I could find any of them in use on Internet facing sites. To test this I went to Shodan to find Rails servers and found approximately 70,000 servers. I downloaded the details for about 20,000 of those servers and looked at the cookies to identify the ones running Rails apps. Rails cookies are distinct because they consist of a base64 encoded string followed by a — and then a HMAC of the base64 string. This gives a cookie, which looks like this.

_Lm2Web_session=BAh7BjoPc2Vzc2lvbl9pZCIlOGY0NTUyMWIyMDMw
NzVmNzI1NjY2ZWEyODg0MzY0ODA%3D--1cad1b4cd816f15162af4ab
97598032a994668be

Of the roughly 20,000 Rails servers, for which I had details, only about 10,000 had cookies that matched the pattern above.

The digest of the cookie is produced by calculating the HMAC of the base64 string using the SHA1 hashing algorithm and the secret token as the salt. To find the secret token we simply calculate the HMAC using each of the potential secret tokens as the salt and see if the calculated digest matches the digest in the cookie. Of the approximately 10,000 cookies, I was able to find 7 secret tokens. This is not very impressive at all but it gave me hope to try a larger test.

I decided to check the Alexa top 1 million web sites to see how many used a cookie with a digest, and for how many I could find the secret token. I’ve tested about 40,000 sites so far and have only found 303 sites that use a cookie that matches the pattern above. Of those 303 sites, I did not find any of the secret tokens. The results are not surprising and I realize this is a long shot that will probably come to nothing but sometimes you just have to test a theory. If I finish the testing I’ll update the blog post with the final stats.

Although I haven’t tried it yet, I believe that if you ran the same test on an internal network you would have more success because there is more likely to be development Rails servers on an internal network. If you’d like to try this on your network you can get the rails_find.py, rails_secret_token.py, and rails_secret_tokens.text files here. The rails_find.py script takes a list of host names or IP addresses and writes any matching cookies to a file. The rails_secret_token.py script takes a file of cookies and the rails_secret_tokens.txt file and tests each token against each cookie.

If you do find a secret token during your testing, Metasploit will get you remote code execution.

Enjoy.

Introduction to Python

I did a quick presentation tonight for the Chattanooga Python Users Group.
introduction_to_python

Hack Yourself First: An Introduction to Penetration Testing

I will be teaching an introductory penetration testing class on December 14, 2013. If you are a system administrator in the Chattanooga area, check it out. You can get the syllabus for the class here, http://asgconsulting.co/static/files/hyf_syllabus.pdf

Will Write Code For Friends

The other day my friend Slade asked me to write a script to take an address range and run an Nmap ping scan against it and then run a SYN scan against only the live hosts using a predefined set of ports. Finally, he wanted a simple output showing the hosts and only the open ports. So, I put together this short Python script. The usage is below:

USAGE:

discover.py IP_addresses '

Addresses must be a valid Nmap IP address range and ports
must be a valid Nmap port list. Any ports provided will be
added to the default ports that are scanned: 21, 22, 23,
25, 53, 80, 110, 119, 143, 443, 135, 139, 445, 593, 1352,
1433, 1498, 1521, 3306, 5432, 389, 1494, 1723, 2049, 2598,
3389, 5631, 5800, 5900, and 6000. The script should be run
with root privileges.

The script uses the -oA switch to save the Nmap results for both the ping scan and the SYN scan. The gnmap file from the SYN scan is then parsed to produce a simple Markdown file that looks like this:

192.168.1.2
===========

OS
--
HP Officejet J4680 printer|HP PhotoSmart C390 or C4780; or
Officejet 6500, 7000, or 8500 printer|HP Photosmart C4500
or C7280, or Officejet J6450 printer

Ports
-----
tcp/80 (open) - Virata-EmWeb 6.2.1 (HP Photosmart C4700
series printer http config)
tcp/ 139 (open) - tcpwrapped
tcp/ 445 (open) - netbios-ssn


192.168.1.1
===========

OS
--
Apple AirPort Extreme WAP or Time Capsule NAS device (NetBSD
4.99), or QNX 6.5.0

Ports
-----
tcp/53 (open) - domain?

In addition to the discover.py script, I created the gnmap2md.py script which converts gnmap formatted files into Markdown formatted files. You can get it here.
As always, I hope you enjoy the script and let me know if you have any trouble with it.

Chattanooga Technology Meetup

This weekend we had a technology meetup at the 4thfloor in the downtown branch of the Chattanooga Library. We had a good turn out and I was able to talk about two of my favorite subjects, Python and infosec.

Here’s the presentation I gave on the Requests library for Python.

Multiprocessor HTML Login Form Brute Force

The other day I needed to brute force an HTTP basic auth login so I fired up Metasploit, as I usually do, and and tried to run the auxiliary/scanner/http/http_login module. The module crashed and printed out a stack trace. Instead of spending time troubleshooting it, I decided to throw together a quick Python script. So I used my multiprocessor SSH brute force script as a template and put together a multiprocessor basic auth script. Well the next day, I needed to brute force an HTML login form so I decided to write Python script to do that as well.

HTTP Basic Auth is quite easy to brute force because after the credentials are sent, the server responds with a 401 status code if they were the wrong credentials and either a 2xx or 3xx status code if they were correct. HTML login forms are much more difficult because there are often cookies that must be set and hidden fields that are included in the form, typically for CSRF purposes. In addition, the body of the server response must be parsed to determine if the login failed or succeeded. So, brute forcing an HTML login forms follows a pattern like this.

  1. GET the login page so that any needed cookies are set.
  2. Parse the login form for any hidden fields and associated values that must be sent in addition to the credentials.
  3. POST the login form with the credentials and any hidden fields.
  4. Parse the response to see if a login failure has occurred and to update the value of any hidden fields.

I built a script that can automate the process but it does require some manual intervention in the form of a configuration file. The configuration file can be seen below and is in JSON format. First, set the login URL and the action URL, this is where the form gets POSTed. Next, set the field name for the username and password and set the files that contain the list of usernames and passwords to try. Next, set the string of text that will be in the failure message and set the number of threads that should be used. Finally, define the names of any hidden fields that should be included in the login form.

{
	"login": "https://domain/login/url",
	"action": "https://domain/login/action",
	"ufield": "login",
	"pfield": "password",
	"ufile": "user",
	"pfile": "pass",
	"fail_str": "Some string that shows our login failed",
	"threads": "1",
	"hidden": [
		"hidden_field_name1", "hidden_field_name2"
		]
}

The script will first GET the login page defined in the config file, set any necessary cookies, and parse the page for the values of the hidden fields defined in the config file. Next, the script POSTs the login credentials and the hidden fields with their values to the action page defined in the config file. Finally, the response is parsed to find the failure string and to update the value of any hidden fields. If the failure string is present in the response, the process is repeated with a new set of credentials. If not, the script will stop and print the credentials that succeeded.

The script and a sample configuration file can be downloaded from the Scripts repository on my GitHub account, https://github.com/averagesecurityguy/scripts. As always, let me know if you have any questions or trouble running the script.

Facebook WTF?

I do not use Facebook and after a few years, I finally convinced my wife to give it up. In my opinion, the social benefits of Facebook are far outweighed by the privacy and security concerns. To demonstrate, my father-in-law recently received a phishing message through facebookmail, see the screenshot below.

facebook_phish

The email has all the typical signs of a phishing email including the bad grammar and the FUD meant to get you to click on the link. The only problem is the link is a legitimate Facebook URL. Confused, I fired up a VM and visited the link, which took me to this page.

facebook_leaving

The page appears to be a security warning with a URL at the bottom. I think most Facebook users would see this as normal and click Continue. In fact, the page is designed to let you know you are leaving Facebook to go to the displayed URL but the only indication that you are leaving Facebook is the title of the page.

facebook_leaving_title

I thought to myself, “That can’t be right, maybe a logged in user gets a different message”. So, I created an account and visited the link again. This time I got a warning message letting me know the link was potentially spammy.

facebook_leaving_warning

Excellent, Facebook is watching out for it’s users and protecting them from spammy links. Not so fast. If you look at the phishing URL closely, you can see it has three parts: http://www.facebook.com/l/, a random string, and the redirect URL. I decide to make some changes to the phishing URL and see what would happen.

If you modify the random string the warning message is no longer displayed because Facebook doesn’t recognize this new URL as malicious. This means that Facebook is detecting the malicious link on the full URL and not on the redirected URL. Based on this, it seems that scammers could setup one site and create many different URLs to redirect to this one site and they would likely never be caught by Facebook.

To prevent problems With these type of links, Facebook should make it very clear that the user is leaving Facebook to go to a new site, a message in the page title is not enough. In addition, Facebook should determine if a link is “spammy” based on the destination URL not based on the original URL.

I reported this as a potential bug but Facebook didn’t seem to see it as a bug. Maybe I’m crazy, what do you think?