Content

Creating Custom Userlists from Document Metadata

In the past on the podcast we’ve talked about a number of tools for document metadata gathering and how we can use them for gathering good information.
docx.jpgI’ve talked about EXIFtool for examining and deleting metadata from JPEGs. This was helpful for some info, but only on images.
I’ve covered Metagoofil, where we use it to download all sorts of common data and word processing type documents and analyze them for interesting information. Unfortunatley, Metagoofil only will produce download from the web and process. We have no ability to process from our store on disk.
By accident I discovered that we can get much of the same information by using EXIFtool not on JPEGs, but on Word, Excel and PowerPoint documents! EXIFtool has the ability to parse metadata as defined by the FlashPix standard, introduced in 1996 developed by Kodak, Hewlett-Packard and Microsoft. Microsoft still uses the format for documents and storing data. We can use EXIFtool to gather usernames from the documents.
Note: This will only work on Office documents were not created with Office 2007 (.docx), as the new version relies on a different metadata storage format. I’ll have a solution for this one soon!
We can start down and dirty with getting the information on Office documents. In the directory that contains our supported office documents, we can execute the following commmand:

$ exiftool -r -h -a -u -g1 * >output.html

metastick.jpgThis will execute EXIFtool to extract all EXIF metadata recursively in the current directory (-r), with all output including duplicates (-a), organizing by EXIF tag category (–g1), for all files, with HTML friendly formatting (-h), into a file named output.html in the current directory (>output.html). With this we get a handy little report HTML report!
But, we may only want just the info on usernames/authors. We can trim the output information down to jsut the appropriate data elements:

$ exiftool -r -a -u -Author -LastSavedBy * >users.txt

We’ve removed the HTML and sorting options, as they will only serve to make any additional processing difficult. I’ve also only grabbed the Author and LastSavedBy tags, as these are the most common places for usernames. Now we can take our users.txt, and remove all of the extra information with some unix text processing:

$ strings users.txt | cut -d":" -f2 | grep -v "=" | grep -v "image files read" | tr '[:space:]' 'n' | sort | uniq  >cleanusers.txt

Now all we are left with is a list of potential user names one per line. We’ve dropped all of the extra text up to the first delimiter (:), dropped the lines that start with “=” and “image files read”, coverted spaces to newlines, sorted alphabetically and removed the duplicates. This will introduce some need for a manual culling, as sometimes the author is listed as “Firstname Lastname”, and they get kept as each name individually. However, in some smaller companies just a first or last name is perfectly acceptable as a username, so you may not want to to cull your list at all.
Now, we are left with a list of potential usernames that we can utilize for password brute force attempts for other services, such as VPNs or web based applications.

Larry Pesce

Larry’s core specialties include hardware and wireless hacking, architectural review, and traditional pentesting. He also regularly gives talks at DEF CON, ShmooCon, DerbyCon, and various BSides. Larry holds the GAWN, GCISP, GCIH, GCFA, and ITIL certifications, and has been a certified instructor with SANS for 5 years, where he trains the industry in advanced wireless and Industrial Control Systems (ICS) hacking. Larry’s independent research for the show has led to interviews with the New York Times with MythBusters’ Adam Savage, hacking internet-connected marital aids on stage at DEFCON, and having his RFID implant cloned on stage at Shmoocon. Larry is also a Principal Instructor and Course Author for the SANS Institute for SEC617: Wireless Penetration Testing and Ethical Hacking and SEC556: IoT Penetration Testing. When not hard at work, Larry enjoys long walks on the beach weighed down by his ham radio, (DE KB1TNF), and thinking of ways to survive the impending zombie apocalypse.

Get daily email updates

SC Media's daily must-read of the most current and pressing daily news

By clicking the Subscribe button below, you agree to SC Media Terms of Use and Privacy Policy.

You can skip this ad in 5 seconds