Unboxing the XKeyScore framework

One of the shocking revelations from the whistleblower, Edward Snowden, threw light on XKeyScore, which is the surveillance tool used by the National Security Agency. XKeyScore is a spying tool used by NSA which could collect data both breadthwise and depth-wise. The input to this tool is the unlimited internet traffic/data packets flowing in all the time, and this is used to retrieve and produce meaningful data. In order to deal with the processing power and storage capabilities required for  billions of data records captured from all over the world, XKeyScore utilizes a methodology of distributed processing wherein it divides tasks between several systems spread across the world(like the map-reduce approach), taking the advantage of distributed computing. To query upon all these distributed systems, XKeyScore uses a federated querying approach, which is discussed in the later parts of the article. 
The following image is a depiction of how huge data dumps from the logs database is extracted to produce meaningful data. This scenario shows the working of one of the systems on the distributed network. There are several such systems on the network.
(Click on the image for full view)
Fig: Working of XKeyScore on a single system
The system makes sense of the flowing traffic by analyzing them using pre-written code snippets that take care of pattern matching. For example, an email will have an email format (from, to, subject, message body, signature etc.), which would be recognized by the code snippet and the data would be grouped and produced in the end terminal when there is a query related to email.  These code snippets are termed as plug-ins in the framework.There are plug-ins to extract and index email addresses, phone numbers (usually extracted from message signatures), files along with their extensions etc. There are plug-ins to collect full log and user activity via instant messengers etc. The user activity plugin is capable of an extracting contact list of the messenger as well. Analysts are restricted to query/analyze traffic in such a way that it spies over US traffic itself.
Below given are some facts about XKeyScore:
(Click on the image for full view)
Some facts and figures on XKeyScore
The system itself is secured and administrators/analysts can login only through HTTPS over browsers or public key cryptography mechanisms.
Since it is difficult to track such a wide number of users over the internet, internet agencies have provided a unique identifier in the form of cookies. So, when a large number of users visit some website even from the same IP address, they can be identified using the cookies. So even if the users are browsing from their VPN or they have some sort of IP masking mechanisms in place, XKeyScore can follow until the cookies are cleared from the browser.
The querying system of XKeyScore is very intelligent. There is a central website for XKeyScore. And all the distributed systems are interconnected. So when an analyst runs a query in the central website, it will recursively run over all systems and fetch data. This approach is known as the federated querying approach.
Now, can XKeyScore intercept encrypted traffic? Yes. But not completely. For instance, let’s take the case of an email. In an email, only the message body/content is encrypted. All other details like sender’s email, receiver’s email, subject of the email, time and date of the conversation etc. are in unencrypted format. All these are metadata and can be intercepted by the XKeyScore analysts. Data in encrypted format would be intercepted, but will not make any sense, as it is not in a readable format. E-mail messages are stored on a remote server, for it to be accessed at a later point in time. So even if they are encrypted, it would be stored on the server and would be accessible, so there are chances of being cracked using strong decryption tools.
The eavesdropping of data is not limited to web traffic alone, it is applicable to VoIP data as well, including webcam pictures and snapshots, search history, Skype call chats etc. Researchers show that anyone who is interested in online privacy, people who are using privacy tools, TOR users, and TAILS Linux users are easily targeted by XKeyScore. 
The source code of the tool is released and is worth a look (Click here) 
Authored by Priyanka Shetti
TCS Enterprise Security and Risk Management
Rate this article: 
Average: 2.7 (6 votes)
Article category: