From bsjohnso at midway.uchicago.edu Fri Sep 27 10:48:25 2002 From: bsjohnso at midway.uchicago.edu (Benjamin Johnson) Date: Thu May 18 12:41:34 2006 Subject: [Cs22800] Schedule Conflict Message-ID: <8F45350D-D230-11D6-9AA2-000393A75360@midway.uchicago.edu> I would like to take this class, however I need to take another class that meets MWF from 1:30 - 2:20. I would come at 2:30 and stay until we finish, and if I needed more time I would talk to Professor O'Donnell outside of the scheduled time period. Does anyone else have any scheduling conflicts? Professor isn't against changing times if others have conflicting course times. On another note, has anyone solidified what they're going to work on? Thanks, Ben From bsjohnso at midway.uchicago.edu Sat Sep 28 21:50:45 2002 From: bsjohnso at midway.uchicago.edu (Benjamin Johnson) Date: Thu May 18 12:41:34 2006 Subject: [Cs22800] My Project - Sebek Message-ID: <406AE972-D356-11D6-A721-000393A75360@midway.uchicago.edu> All, I have decided to do Sebek, a project from the Honeynet Research Alliance (honeynet.org). I will basically be spending the next few days getting a more specific "assignment" from Edward Balas, Mike Clark and Lance Spitzner, the three guys in charge of the Sebek project. Below is a description of what Sebek is and where they want it to go. I will be trying to add features along with porting the project to systems other than linux (and possibly testing it on less standardized / less popular linux OSs). If anyone is interested in possibly using Sebek as an individual CS228 project or is possibly interested in working with myself, don't hesitate to shoot me an e-mail. Take care, Ben P.S. - Sebek is essentially a sophisticated and modified rootkit used to capture data on a honeypot (see honeynet.org for more info about honeypots / honeynets) -------------------------- Hey all, Here is my current brain dump on where I think we want to go with sebek. There are a number of open questions: 1. What applications other that SSH should be we be looking to acquire data from? (mike and I have discussed SILC) 2. What data export methods should we go after? I have my thoughts, see if you can find them ;-) Also if anyone else has ideas, I am all ears. 3. For the other unix ports of sebek, what are our priorites? As MS port will be a unique beast, with high priority I see this as a parallel task. 4. Do we need any other requirements for the sebek capture component? -------------------------------------------------------------------- Overview : Historically, Sebek was designed to collect forensic data that we could not obtain by simply examining packet captures. There were two types of data that were of interest: SSH session logs, and secure file transfers using SCP. Conceptually, Sebek allows us to selectively capture any data that enters kernel space from from a process, and then export this data to a remote system using a covert communication channel. There are three primary components to sebek: the component on the honeypot that captures data, the component that collects the exported captured data, and the component that process the captured data. Below is a list of requirements for the Capture component: The presence of the Sebek on a honeypot must not be detectable by an intruder, specifically sebek files and processes should be hidden and if possible sebek log files should not exist or be of fixed size. The covert channel used by sebek must be either impossible to detect by an intruder or sufficiently encrypted and obfuscated as to make it unlikely that an intruder will become alarmed at its presence. All flavors of sebek must use a common export format that facilitates the development of sebek log processing tools. Sebek implementations must provide the ability to configure what data we select for logging. For now we wont worry about the requirements for the other 2 components in the system. ------------------------------------------------------------------- Current Version of Sebek: In its current version Sebek circumvents encryption by intercepting unencrypted data in kernel space that has been passed by an application using the read() call. We intercept this data by installing our own version of the read cal in the system call table. Once the data is collected we store it temporarily in a ring buffer with is accessed via special device. A user space application then reads the data and exports this on the LAN using spoofed header data and encrypted payload. This version of sebek is based on the adore rootkit, we use the rootkit functionality to hide the device, and user space application. Sebek does not collect all data that passes through the read call. Only data that matches a specific signature is exported. Currently these signatures are static and contain the following attributes: - Process ID - Process User ID - Process Command Name - File Descriptor - TTY(if using) - length of read data Once the data is collected it is read from the device, encrypted and sent onto the LAN with forged headers. How the headers are forged is configurable, but all data looks like large UDP packets. The exporting application, sdm, even introduces a variable amount of inter-packet delay, and sends decoy packets when idle. However this is currently the only method of getting data off the honeypot, and even though the data is obscured a savvy intruder will be suspicious of this traffic. The final step in the process is to collect the exported data and to post process it, the central collection is done by sebeksniff, and the post processing is done by sbdump. Currently sbdump allows users to extract any file SCPed or extract interactive terminal logs from the sebek logs. Current Limitations: The "signatures" used by sebek to collect data are not runtime configurable. The size of the ring buffer is not tunable at runtime. The rootkit isn't perfect and there are ways to determine if a the exporting application is running, even if a intruder doesn't look at the network traffic. Linux is the only OS currently supported. The only "covert" channel implemented is via ethernet where the presence of odd traffic can be seen by skillful intruder. ------------------------------------------------------------------ Where do we want to go next? Well thats open for input, but there are a number of areas that I see we can improve sebek, most address the Currently Limitations outlined above. The two highest priorities that I see are to port the collection side of sebek to support operating systems other than Linux, and to improve the overall covertness of sebek. The first is fairly straight forward conceptually, the second can lead us into all kinds of directions. The covertness of sebek has 2 components: the ability to hide the components of sebek on the honeypot, and the second is the hiding of the communication channel itself. The currently communication channel is interesting but anybody with a sniffer will realize that something is different from a normal box, even though we have really done our best to obscure everything about the true nature of the traffic. What we should focus on is ways of exporting the data where the intruder can not determine that there is a change in system activity of any sort. I see a number of ways to do this: 1. Stick with packet based export but either trojan the calls used by libpcap to acquire packets such that if a certain attribute is set in the ethernet header the kernel just ignores that packet. As an example if we make the export packets look like the are part of a specific vlan we just add code to ignore packets with that Tab. This does the following: - intruder doesnt see network activity. - we don't need to encrypt the packets because they can be accessed and thus we don't need a user space application because we can transmit the packets all in kernel space. 2. If we're using vmware or vmware GSX, this is probably the best way in my mind. Dump the data onto one of the serial ports and configure vmware to terminate the Guests serial interface into a log file on the vmware server. This does a number of things: - intruder doesn't see network activity. - eliminates the need for user space process, because kernel module directly writes to serial port. - eliminates the need for sebeksniff, because data is dumped directly to a file on the server. - eliminate the need for the sebek device, because we can use the serial port to configure at runtime. 3. The really covert network based method. This technique uses funny ethernet frames that are abnormally large relative to their payload and the IP or other packet inside of the frames are all legitimate and even look good to ethereal for the most part. This method is really cool ,but still causes the macroscopic perspective of the network traffic to look unnatural, just as the current UDP export solution. Once we settle on an approach(s) to data export we can look at other ways of making the components more covert, specifically eliminating the need for a user space process and for that matter reducing need for a full blown rootkit It should be noted that we'll probably want multiple solns here. For individuals doing vmware based Gen II honeypots I see the serial export option for sebek being the most ideal, as it is soo easy. For general purposes I see option 1 as maybe a good route if we can make the network code fly. The other thing we need to look at regarding sebek is what other applications we want to specifically "support" currently we only address SSH and SCP, mike and I have talked about silc itself as a target of sebek affection;-) are there other applications we should be looking at? ------------------------------------------------------------------ Ed