[Cs22800] My Project - Sebek

Sat Sep 28 21:50:45 CDT 2002

All,

I have decided to do Sebek, a project from the Honeynet Research 
Alliance (honeynet.org).

I will basically be spending the next few days getting a more specific 
"assignment" from Edward Balas, Mike Clark and Lance Spitzner, the 
three guys in charge of the Sebek project.

Below is a description of what Sebek is and where they want it to go.  
I will be trying to add features along with porting the project to 
systems other than linux (and possibly testing it on less standardized 
/ less popular linux OSs).

If anyone is interested in possibly using Sebek as an individual CS228 
project or is possibly interested in working with myself, don't 
hesitate to shoot me an e-mail.

Take care,

Ben

P.S. - Sebek is essentially a sophisticated and modified rootkit used 
to capture data on a honeypot (see honeynet.org for more info about 
honeypots / honeynets)
--------------------------

Hey all,

Here is my current brain dump on where I think we want
to go with sebek.  There are a number of open questions:

1. What applications other that SSH should be we be looking
    to acquire data from?  (mike and I have discussed SILC)

2. What data export methods should we go after? I have my
    thoughts, see if you can find them ;-) Also if anyone
    else has ideas, I am all ears.

3. For the other unix ports of sebek, what are our priorites?
    As MS port will be a unique beast, with high priority I
    see this as a parallel task.

4. Do we need any other requirements for the sebek capture
    component?

--------------------------------------------------------------------

Overview :

Historically, Sebek was designed to collect forensic data that
we could not obtain by simply examining packet captures.  There
were two types of data that were of interest: SSH session logs,
and secure file transfers using SCP. Conceptually, Sebek allows
us to selectively capture any data that enters kernel space from
from a process, and then  export this data to a remote system
using a covert communication channel. There are three primary
components to sebek: the component on the honeypot that captures
  data, the component that collects the exported captured data, and
the component that process the captured data. Below is a list of
requirements for the Capture component:

  The presence of the Sebek on a honeypot must not be detectable by
  an intruder, specifically sebek files and processes should be
  hidden and if possible sebek log files should not exist or be of
  fixed size.

  The covert channel used by sebek must be either impossible to
  detect by an intruder or sufficiently encrypted and obfuscated as
  to make it unlikely that an intruder will become alarmed at its
  presence.

  All flavors of sebek must use a common export format that
  facilitates the development of sebek log processing tools.

  Sebek implementations must provide the ability to configure what
  data we select for logging.

For now we wont worry about the requirements for the other 2
components in the system.

-------------------------------------------------------------------

Current Version of Sebek:

In its current version Sebek circumvents encryption by intercepting
unencrypted data in kernel space that has been passed by an
application using the read() call.  We intercept this data by
installing our own version of the read cal in the system call table.
Once the data is collected we store it temporarily in a ring buffer
with is accessed via special device.  A user space application then
reads the data and exports this on the LAN using spoofed header data
and encrypted payload. This version of sebek is based on the adore
rootkit, we use the rootkit functionality to hide the device, and
user space application.

Sebek does not collect all data that passes through the read call.
Only data that matches a specific signature is exported.  Currently
these signatures are static and contain the following attributes:

- Process ID
- Process User ID
- Process Command Name
- File Descriptor
- TTY(if using)
- length of read data

Once the data is collected it is read from the device, encrypted and
sent onto the LAN with forged headers.  How the headers are forged
is configurable, but all data looks like large UDP packets.  The
exporting application, sdm, even introduces a variable amount of
inter-packet delay, and sends decoy packets when idle. However this
is currently the only method of getting data off the honeypot, and
even though the data is obscured a savvy intruder will be suspicious
of this traffic.

The final step in the process is to collect the exported data and to
post process it, the central collection is done by sebeksniff, and
the post processing is done by sbdump.  Currently sbdump allows
users to extract any file SCPed or extract interactive terminal logs
from the sebek logs.

Current Limitations:

The "signatures" used by sebek to collect data are not runtime
configurable.

The size of the ring buffer is not tunable at runtime.

The rootkit isn't perfect and there are ways to determine if a the
exporting application is running, even if a intruder doesn't look at
the network traffic.

Linux is the only OS currently supported.

The only "covert" channel implemented is via ethernet where the
presence of odd traffic can be seen by skillful intruder.

------------------------------------------------------------------
Where do we want to go next?

Well thats open for input, but there are a number of areas that I
see we can improve sebek, most  address the Currently Limitations
outlined above.

The two highest priorities that I see are to port the collection
side of sebek to support operating systems other than Linux, and to
  improve the overall covertness of sebek. The first is fairly
straight forward conceptually, the second can lead us into all
kinds of directions.

The covertness of sebek has 2 components: the ability to hide the
components of sebek on the honeypot, and the second is the hiding
of the communication channel itself.

The currently communication channel is interesting but anybody with
a sniffer will realize that something is different from a normal
box, even though we have really done our best to obscure everything
about the true nature of the traffic. What we should focus on is
ways of exporting the data where the intruder can not determine that
there is a change in system activity of any sort.  I see a number of
  ways to do this:

1.  Stick with packet based export but either trojan the calls used
     by libpcap to acquire packets  such that if a certain attribute
     is set in the ethernet header the kernel just ignores that
     packet.  As an example if we make the export packets look like
     the are part of a specific vlan we just add code to ignore
     packets with that Tab. This does the following:

     - intruder doesnt see network activity.

     - we don't need to encrypt the packets because they can be
       accessed and thus we don't need a user space application
       because we can transmit the packets all in kernel space.

2.  If we're using vmware or vmware GSX, this is probably the best
     way in my mind. Dump the data onto one of the serial ports and
     configure vmware to terminate the Guests serial interface into
     a log file on the vmware server. This does a number of things:

     - intruder doesn't see network activity.

     - eliminates the need for user space process, because kernel
       module directly writes to serial port.

     - eliminates the need for sebeksniff, because data is dumped
       directly to a file on the server.

     - eliminate the need for the sebek device, because we can use
       the serial port to configure at runtime.

3.  The really covert network based method.  This technique uses
     funny ethernet frames that are abnormally large relative to
     their payload and the IP or other packet inside of the frames
     are all legitimate and even look good to ethereal for the most
     part.  This method is really cool ,but still causes the
     macroscopic perspective of the network traffic to look
     unnatural, just as the current UDP export solution.

Once we settle on an approach(s) to data export we can look at other
ways of making the components more covert, specifically eliminating
the need for a user space process and for that matter reducing need
for a full blown rootkit  It should be noted that we'll probably
want multiple solns here.

For individuals doing vmware based Gen II honeypots I see the serial
export option for sebek being the most ideal, as it is soo easy.
For general purposes I see option 1 as maybe a good route if we
can make the network code fly.

The other thing we need to look at regarding sebek is what other
applications we want to specifically "support" currently we only
  address SSH and SCP, mike and I have talked about silc itself as
a target of sebek affection;-) are there other applications we
should be looking at?

------------------------------------------------------------------

Ed