Thursday, July 26, 2012

Searching for packet catpuring and interface manipulation library for Python...

I needed a script that would monitor network traffic and capture and process only DHCP traffic. It turned out I couldn't find such script so I decided to write one (more about that script in another post). For a language I decided to use Python. That was the easy part. Now, I had to decide which libraries I will use that will allow me to capture network traffic, decode DHCP request and responses, and manipulate IP addresses on interfaces.

I started with the network traffic capturing. pcap library is the library for network capture, so it was natural for me to search for a Python interface to this library. I found several such interfaces, i.e. pcap, pylibpcap, pypcap, and pcapy. There is also library interface specifically for Python 3, i.e. py3kcap. While searching for pcap interface, three other Python libraries poped out: libdnet (here is the old project page), dpkt and scapy.

But, not all libraries are equal, nor they serve the same purpose. libdnet allows sending packets, manipulation with kernel's routing tables, firewall and arp cache. So, besides Ethernet and IP, it doesn't offer much more in term of supported protocols. dpkt, on the other hand, is made just for this purpose! It supports easy creation and parsing of different TCP/IP protocols. Finally, Scapy is a swiss army knife of network manipulation. It offers shell in which one can manipulate packets, but also can be used within other scripts. Unfortunately, while browsing the source of Scapy I realized that it uses os.popen interface and calls external programs. So, this actually was enough for me to eliminate scapy from further consideration.

The next elimination criteria is availability of the packages within CentOS and Fedora. I try to hold on prepackaged software as much as possible, so quick search (yum search) showed that on both, CentOS 6 and Fedora 17, there are packages for pcapy and dpkt (named python-dpkt). For some reason, there is dnet, but python interface isn't packaged. I found this bugzilla entry, but without any answer!

So, I settled on pcapy and dpkt. The only piece of puzzle that was missing now is how to manipulate interface addresses. I stumbled on netifaces, which allows me to obtain information about interfaces and also on this post for Windows. But all the results I got were on how to obtain IP address. In the end, I gave up and decided that I'll try to use libdnet even though I'll have to compile it from the source. Either that, or I'll use raw sockets and ioctls which are accessible from Python using standard libraries.

And for the end, as a curiosity, I'll mention that there is Python interface to IPTables, python-iptables, which is also packaged for Fedora.

Tuesday, July 24, 2012

ntop 5 on CentOS 6...

Last week I decided to install ntop on one of my CentOS 6 machines and, to much of my surprise, it turned out that there is no ntop package in the standard CentOS6 repositories (i.e. Base, EPEL, RPMFusion). Then, I looked into Fedora repository and it turned out that there is package, but for the older version, i.e. 4.0 (the newest version of ntop at the time this post was written was 5.0). So, I downloaded that older version, placed new version of ntop, modified a bit SPEC file and tried to build it. It didn't work intermediately, but after few more tweaks it worked. I filled a bug report on RedHat's bugzilla so that maintainter can upgrade a package, if he wishes so.

In the mean time, I decided to build package for CentOS 6. The main problem is that Fedora introduced Systemd instead of traditional SysV init used by CentOS. To cut the story short, I managed to do that, too. The resulting SPEC file can be used in both Fedora and EPEL6. I uploaded new SPEC file (and init file) to bugzilla, so you can fetch them there if you wish.

Until maintainers decide what to do, and if anything to do with it, here are the SRPM file and resulting binary RPM file for 64-bit CentOS6.

Friday, July 20, 2012

A case against wizards...

Well, I mean on those configuration wizards that allow you to quickly setup and get on going with something.

They have their advantages, but also disadvantages. In my opinion one big disadvantage is that they take away one very important thing from you and that is making mistakes. Yes, because we learn by making mistakes, and if everything goes right, we haven't learned much. In short term, you won, but in long term, I think you lose. Namely, when something doesn't go right - and things have a huge tendency not to go right - then, if you show a problem to a person that did a lot of mistakes, and the one that used wizard and didn't have a clue of what can go wrong, I think that the one that made mistakes will be more efficient in solving a problem.

So, what would be a conclusion? Well, I think you should first try a harder way and only after you mastered it, take a shortcuts to be as quick as possible.

Integrating FreeIPA and Alfresco...

After describing how to install CentOS, DNS and reverse DNS, FreeIPA and Alfresco, in this post I'm going to describe how to integrate Alfresco with FreeIPA. I want to achieve the following goals with the integration:
  • Users and groups are kept within FreeIPA and authentication is done by FreeIPA.
  • Alfresco Web interface honors Kerberos tickets. Upon opening Web interface users are immediately presented with their pages withoug necessity for authentication (if, of course, they have valid Kerberos tickets).
  • Authentication when mounting DAV share is also done via Kerberos tickets.
In short, I want to achieve SSO (Single Sign-On) as much as possible. Users sign in when they start to use their workstations once, that's the only time they have to enter password.

Biseri naših neukih novinara 5...

Eto, naletih na još jedan zanimljiv članak. Ovaj vam pokušava objasniti kako je onesposobljen Grum. Moram priznati da sam se trebao pošteno napregnuti da bi razumio što je pjesnik htio reći.

Prvo, članak kreće sa izjavom tipa srušeni su serveri?! Kako su točno srušeni? Na pod, ili? Al' dobro, da ne filozofiram trebalo je reći da su onesposobljeni, i to vrlo vjerojatno tako što su bili isključeni. :) Jednostavnije od toga ne može biti. :)

Zatim, pojam Grum robot? Nisam siguran da li je riječ robot pokušaj prijevoda riječi botnet? Moguće, iako, ne vidim vezu između robota i botneta, osim ako se ne radi o jednoj novoj i inovativnoj upotrebi riječi robot. Međutim, ako je pokušaj prijevoda onda moram reći kako je konzistencija ravna nuli. Naime, već u drugom dijelu teksta se upotrebljava izraz Command and Control serverima. To niti je na Engleskom, niti na Hrvatskom jeziku. A ako nije prijevod, onda je totalno nerazumijevanje teme o kojoj se piše.

I kad smo kod prijevoda, imamo tu još i izraz "uspješno stavljen offline". Hm, zvuči kao da su došli do poslužitelja i satima se mučili da ta isključe.

Isto tako izraz treći najjači. Prije bi rekao najveći, ili još bolje, treći po veličini. U ovom slučaju mjera je broj računala, a puno manje njihova snaga - ako uopće.

I za kraj, ovaj paragraf nisam uspio razumjeti jer je u biti totalno besmislen:
Grum robot je radio s takozvanim Command and Control serverima koji su bili neka vrsta centralnih stanica odakle se širio malware na zaražena računala koja su pak omogućavala pristup CnC serverima za slanje velikih količina poruka, a bez znanja samih korisnika.
Da zaključim: Ovaj puta se ne radi o nekim jako poznatim novinama, a vjerojatno ni novinaru, međutim to ih ne opravdava. Također, kada već prevode s Engleskog trebali bi prevoditi s razumijevanjem!

Ostale postove iz ove "serije" možete pronaći ovdje.

Querying SNORT SQL database

When SNORT stores its data into SQL database then there is obvious question how to get data you would otherwise had in plain log files generated by SNORT. So, here is what I managed to deduce so far (note that the post will be extended as I learn more). In case you have comment/addition/correction please post a comment on this post. That is especially valid for SQL queries as I'm not an expert in that area and some of them might be suboptimal.

Few introductory words


To try the following examples you need working instance of MySQL database and SNORT that logs into database (directly or via barnyard2). If you have that, then run mysql command line client (or some equivalent) and select SNORT database. You are now ready to go...

This post is written using schema version 107. To find out which version of schema you have, run the following query:
mysql> select * from `schema`;
+------+---------------------+
| vseq | ctime               |
+------+---------------------+
|  107 | 2012-07-10 10:20:52 |
+------+---------------------+
1 row in set (0.00 sec)
Note the backticks! Namely, schema is MySQL's reserved word and if you don't use backticks, MySQL will report syntax error! Alternatively, you can use syntax database.tablename to avoid table name being treated as a reserved word.

Finally, because of screen size constraints, I'm limiting the output more often than not, here is what you'll see in that regard:
  1. In SELECT statement, I'm using LIMIT N keyword to get only first N rows.
  2. I'll explicitly enumerate fields to be returned in SELECT statement instead of using star (i.e. SELECT column1,column2 instead of SELECT *).
  3. I'll also use LEFT() function to limit number of characters retrieved from VARCHAR and similarly typed columns.

Examples of queries


The first thing you probably want to find out is how many alerts there were on a certain day, e.g. on a July 10th, 2012. This is easy, just run the following query:
mysql> select count(*) from event where timestamp between '2012-07-10' and '2012-07-11';
+----------+
| count(*) |
+----------+
|    12313 |
+----------+
1 row in set (0.01 sec)
Two things you should note about this query:
  1. All the generated events are stored in the table event. There is a column timestamp which stores timestamp when an event was generated.
  2. To select date range I'm using between/and keywords. I'm also shortening typing by providing only a date while time is assumed to be 00:00:00 so this query basically catches anything on July 10th, 2012, as requested.
I could equally well use the following query:
select count(*) from event where date(timestamp)='2012-07-10';
to get the same result, but in case I want a range instead of a single day, syntax using BETWEEN keyword is better.

To get number of events generated on a current day, use the following query:
mysql> select count(*) from event where date(timestamp)=date(now());
+----------+
| count(*) |
+----------+
|      178 |
+----------+
1 row in set (0.13 sec)
Note that we are using function NOW() to get current time and then we just extract date using DATE() function.

While we are at the table events, here is its structure:
mysql> show columns from event;
+-----------+------------------+------+-----+---------+-------+
| Field     | Type             | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+-------+
| sid       | int(10) unsigned | NO   | PRI | NULL    |       |
| cid       | int(10) unsigned | NO   | PRI | NULL    |       |
| signature | int(10) unsigned | NO   | MUL | NULL    |       |
| timestamp | datetime         | NO   | MUL | NULL    |       |
+-----------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Only the timestamp column contains data in this table, other columns are links to other tables as follows:
  1. sid and cid are links to packet data, i.e. IP/TCP/UDP headers and associated data. Those are placed within separate tables which we'll talk about later.
  2. signature is link (foreign key) to signature table column sig_id
Ok, what about finding out number of events per day? Well, easy again, the following select statement will do that:
mysql> select count(*),date(timestamp) as count from event group by date(timestamp);
+----------+------------+
| count(*) | count      |
+----------+------------+
|    11689 | 2012-06-28 |
|    17904 | 2012-06-29 |
|     4353 | 2012-06-30 |
|     4322 | 2012-07-01 |
|    14198 | 2012-07-02 |
|     2977 | 2012-07-03 |
|    12313 | 2012-07-10 |
|    13014 | 2012-07-11 |
|     9126 | 2012-07-12 |
|     2642 | 2012-07-17 |
|     1527 | 2012-07-19 |
+----------+------------+
11 rows in set (0.07 sec)
I could use ORDER BY statement to get a day with largest number of alerts, otherwise they are sorted according to a day. In this case I used function DATE() to chop time part of the timestamp. Otherwise, I would get alerts broken down by minutes.

Ok, let's move on. What about finding out all types of events that occurred, or in other words, all signatures. Well, signatures that SNORT generates are stored in the table signature and simple query on this table will give us the answer what signatures were generated so far:
mysql> select sig_id,sig_name from signature;
+--------+-----------------------------------------------------------------------+
| sig_id | sig_name                                                              |
+--------+-----------------------------------------------------------------------+
|      1 | SCAN UPnP service discover attempt                                    |
|      2 | stream5: TCP Small Segment Threshold Exceeded                         |
|      3 | http_inspect: NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE |
|      4 | http_inspect: MESSAGE WITH INVALID CONTENT-LENGTH OR CHUNK SIZE       |
|      5 | stream5: Reset outside window                                         |
|      6 | ssh: Protocol mismatch                                                |
+--------+-----------------------------------------------------------------------+
6 rows in set (0.00 sec)
All in all, our SNORT instance generated six different signatures so far. The table signature has the following structure:
mysql> show columns from signature;
+--------------+------------------+------+-----+---------+----------------+
| Field        | Type             | Null | Key | Default | Extra          |
+--------------+------------------+------+-----+---------+----------------+
| sig_id       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| sig_name     | varchar(255)     | NO   | MUL | NULL    |                |
| sig_class_id | int(10) unsigned | NO   | MUL | NULL    |                |
| sig_priority | int(10) unsigned | YES  |     | NULL    |                |
| sig_rev      | int(10) unsigned | YES  |     | NULL    |                |
| sig_sid      | int(10) unsigned | YES  |     | NULL    |                |
| sig_gid      | int(10) unsigned | YES  |     | NULL    |                |
+--------------+------------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
The columns are:
  1. sig_id is primary key of this table.
  2. sig_name is textual representation of signature.
  3. sig_class_id
  4. sig_priority
  5. sig_rev
  6. sig_sid
  7. sig_gid
Ok, the next thing you might want to know is how many time each alert was generated. So, to achieve this use the following SQL query:
mysql> select sig_id,left(sig_name,30),count(*) from signature as s, event as e where s.sig_id=e.signature group by sig_name;
+--------+--------------------------------+----------+
| sig_id | left(sig_name,30)              | count(*) |
+--------+--------------------------------+----------+
|      4 | http_inspect: MESSAGE WITH INV |      109 |
|      3 | http_inspect: NO CONTENT-LENGT |      198 |
|      1 | SCAN UPnP service discover att |    55440 |
|      6 | ssh: Protocol mismatch         |     2360 |
|      5 | stream5: Reset outside window  |    33698 |
|      2 | stream5: TCP Small Segment Thr |      971 |
+--------+--------------------------------+----------+
6 rows in set (0.23 sec)
We had to do a join across two tables, signature and event. As you can see I got specific signatures with their count. Furthermore, I could order them so that I have most frequent ones on top (or bottom). Also, you should note that I'm using LEFT() function to make the output shorter in order to fit this post.

Ok, what about finding number of signatures generated on a specific day, say, today? Well, this is the same as the previous query but we only have to add one more condition, namely that the rows from the table event are taken into account only if timestamp is from today:
mysql> select sig_id,left(sig_name,30),count(*) from signature as s, event as e where s.sig_id=e.signature and date(e.timestamp)=date(now()) group by sig_name;
+--------+--------------------------------+----------+
| sig_id | left(sig_name,30)              | count(*) |
+--------+--------------------------------+----------+
|      6 | ssh: Protocol mismatch         |      226 |
|      5 | stream5: Reset outside window  |        2 |
|      2 | stream5: TCP Small Segment Thr |       40 |
+--------+--------------------------------+----------+
3 rows in set (0.14 sec)
Easy, the only difference from the previous query is shown in italic font. Now, let us move on. Suppose we want to know hosts that generated packets that triggered alerts. In order to do that we have to include table iphdr in the query. Table iphdr contains data from the IP header of captured packet. So, run the following SELECT statement:
mysql> select signature,count(*) as cnt,inet_ntoa(ip_src) from event,iphdr where event.cid=iphdr.cid and event.sid=iphdr.sid group by ip_src order by cnt;
+-----------+-------+-------------------+
| signature | cnt   | inet_ntoa(ip_src) |
+-----------+-------+-------------------+
|         3 |     1 | 192.168.1.44      |
|         5 |     1 | 192.168.1.89      |
|         5 |     1 | 192.168.1.27      |
|         5 |     1 | 192.168.1.5       |
|         5 |     1 | 192.168.1.120     |
|         5 |     1 | 192.168.0.21      |
+-----------+-------+-------------------+
6 rows in set (0.0 sec)
Ok, I have source IP addresses that triggered total of CNT number of alerts. Note that IP addresses are kept in a decimal form, so they have to be converted into dot form using inet_ntoa() MySQL function.

Here is the structure of iphdr table:
mysql> show columns from iphdr;
+----------+----------------------+------+-----+---------+-------+
| Field    | Type                 | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+-------+
| sid      | int(10) unsigned     | NO   | PRI | NULL    |       |
| cid      | int(10) unsigned     | NO   | PRI | NULL    |       |
| ip_src   | int(10) unsigned     | NO   | MUL | NULL    |       |
| ip_dst   | int(10) unsigned     | NO   | MUL | NULL    |       |
| ip_ver   | tinyint(3) unsigned  | YES  |     | NULL    |       |
| ip_hlen  | tinyint(3) unsigned  | YES  |     | NULL    |       |
| ip_tos   | tinyint(3) unsigned  | YES  |     | NULL    |       |
| ip_len   | smallint(5) unsigned | YES  |     | NULL    |       |
| ip_id    | smallint(5) unsigned | YES  |     | NULL    |       |
| ip_flags | tinyint(3) unsigned  | YES  |     | NULL    |       |
| ip_off   | smallint(5) unsigned | YES  |     | NULL    |       |
| ip_ttl   | tinyint(3) unsigned  | YES  |     | NULL    |       |
| ip_proto | tinyint(3) unsigned  | NO   |     | NULL    |       |
| ip_csum  | smallint(5) unsigned | YES  |     | NULL    |       |
+----------+----------------------+------+-----+---------+-------+
14 rows in set (0.00 sec)
sid and cid columns are connection to event table, and to tcphdr and udphdr tables. The rest of the columns contain data from IP header. For example, ip_ver contains IP version. So, you can try to see how many protocol versions that triggered alerts there was:
mysql> select ip_ver,count(*) from iphdr group by ip_ver;
+--------+----------+
| ip_ver | count(*) |
+--------+----------+
|      4 |    92445 |
+--------+----------+
1 row in set (0.04 sec)
In my case, it was only IPv4. We can also do the same with the other fields, like which transport layer protocols were observed:
mysql> select ip_proto,count(*) from iphdr group by ip_proto;
+----------+----------+
| ip_proto | count(*) |
+----------+----------+
|        6 |    43076 |
|       17 |    49785 |
+----------+----------+
2 rows in set (0.04 sec)
Obviously, only two, UDP (id 17) and TCP (id 6). BTW, those numbers you can look up in /etc/protocols file on any Linux machine, or you can go to IANA.

To see all source IP addresses that triggered alerts we can use the following query:
mysql> select inet_ntoa(ip_src),count(*) from iphdr group by ip_src limit 5;
+-------------------+----------+
| inet_ntoa(ip_src) | count(*) |
+-------------------+----------+
| 10.61.34.152      |       20 |
| 85.214.67.247     |        2 |
| 134.108.44.54     |        2 |
| 192.168.5.71      |       10 |
| 192.168.102.150   |     2130 |
+-------------------+----------+
5 rows in set (0.00 sec)
Now, it can turn out that there are some IP addresses that we actually didn't expect and we want to know, when and what happened. Take for example the address 10.61.34.152 from the above output, let's see what this address generated:
mysql> select inet_ntoa(ip_src),inet_ntoa(ip_dst),count(*) from iphdr where inet_ntoa(iphdr.ip_src)='10.61.34.152' group by ip_dst;
+-------------------+-------------------+----------+
| inet_ntoa(ip_src) | inet_ntoa(ip_dst) | count(*) |
+-------------------+-------------------+----------+
| 10.61.34.152      | 239.255.255.250   |       20 |
+-------------------+-------------------+----------+
1 row in set (0.03 sec)
Using this query we see that all the packets were destined to address 239.255.255.250. A bit of grouping according to date:
mysql> select date(timestamp),count(*) from event,iphdr where (event.cid,event.sid)=(iphdr.cid,iphdr.sid) and inet_ntoa(ip_src)='10.61.34.152' group by date(timestamp);
+-----------------+----------+
| date(timestamp) | count(*) |
+-----------------+----------+
| 2012-07-02      |       20 |
+-----------------+----------+
1 row in set (0.03 sec)
we see that all events were generated on the same day. And what was the alert:
mysql> select signature.sig_name,count(*) from signature,event,iphdr where (event.cid,event.sid)=(iphdr.cid,iphdr.sid) and inet_ntoa(ip_src)='10.61.34.152' and event.signature=signature.sig_id group by sig_id;
+------------------------------------+----------+
| sig_name                           | count(*) |
+------------------------------------+----------+
| SCAN UPnP service discover attempt |       20 |
+------------------------------------+----------+
1 row in set (0.84 sec)
Well, all were UPnP service discovery requests.

One interesting thing, at least for me, is who sent ICMP Echo Request messages on the network. This is easy to determine using the following query:
mysql> select inet_ntoa(iphdr.ip_src) as SRC,inet_ntoa(iphdr.ip_dst) as DST,timestamp from event,iphdr,icmphdr where (icmphdr.sid,icmphdr.cid)=(event.sid,event.cid) and (iphdr.sid,iphdr.cid)=(event.sid,event.cid) and icmp_type=8 limit 3;
+-------------+--------------+---------------------+
| SRC         | DST          | timestamp           |
+-------------+--------------+---------------------+
| 192.168.1.8 | 192.168.1.55 | 2012-07-20 11:05:01 |
| 192.168.1.8 | 192.168.1.55 | 2012-07-20 11:05:01 |
| 192.168.1.8 | 192.168.1.55 | 2012-07-20 11:05:02 |
+-------------+--------------+---------------------+
3 rows in set (0.00 sec)

Obviousy, host with address 192.168.1.8 sent probes to host 192.168.1.55.

So much for now. Detailed info about DB schema used by SNORT can be found on this link.

In the end, my impression is that it is definitely much more easier and efficient to gather statistics using SQL database than plain files but that it is the best to use some tool that has all those queries predefined and to fall back to SQL only when you have some very specific requirement.

Thursday, July 19, 2012

Temeljne dozvole na Linux/Unix operacijskom sustavu...

Na Linuxu postoje tri sustava za kontrolu pristupa objektima (primjerice datotekama, procesima, dijeljenoj memoriji):
  1. Temeljne Unix dozvole koje su nastale s prvim Unixom i danas ih podržavaju svi klonovi Unixa.
  2. ACL-ovi koju su naprednija varijanta temeljnih Unix dozvola, odnosno, njihova nadogradnja.
  3. SELinux dozvole koje su specifične za Linux i dosta složenije od prethodne dvije vrste.
U ovom postu bit će riječi samo o temeljnim Unix dozvolama koje su relativno jednostavne za shvatiti i koristiti. Preporučam prije ovog posta pročitati post o  korisnicima i grupama.

Kao što je rečeno u postu o korisnicima i grupama kada se izlista fajl korištenjem naredbe ls, dobija se nešto u sljedećem stilu:
$ ls -l mozilla.pdf
-rw-r--r--. 1 sgros zemris 40360 Vel 27 21:30 mozilla.pdf
Vlasnik je u ovom slučaju korisnik sgros, dok je grupa zemris. Grupa znakova na početku su dozvole, počevši od drugog znaka, slova r. Prvi znak je tip datotečnog objekta i ako je tamo minus onda se radi o običnoj datoteci - što je ovdje slučaj.

Dozvole idu u grupama po tri znaka (odnosno, tri bita budući da se svaki znak zapisuje korištenjem jednog bita). U ovom slučaju su grupe dozvola rw-, zatim r-- i konacno r--. Prva grupa (rw-) primjenjuje se kada vlasnik datoteke pokuša pristupiti datoteci. Druga grupa (r--) primjenjuje se kada korisnik koji je član grupe kojoj datoteka pripada pokuša pristupiti datoteci. Konačno, treća grupa (r--) se primjenjuje na sve ostale korisnike.

Kao što je rečeno, u svakoj grupi su tri bita. Prvi bit označava da li određeni korisnik, grupa ili ostali imaju pravo čitanja datoteke (ako imaju u tom slučaju piše slovo r) ili nemaju (u tom slučaju je ispisan minus). Drugo slovo označava da li imaju pravo pisanja u datoteku (postoji slovo w) ili ne (samo je minus) i konačno, treće slovo označava da li je datoteka izvršna (slovo x), tj. može li se pokrenuti kao program, ili ne (stoji samo minus).

Dakle, u navedenom primjeru, vlasnik datoteke (korisnik sgros) može čitati datoteku (prvo slovo r u grupi rw-), može pisati u nju (slovo w u grupi rw-) i ne može ju izvršavati odnosno ne radi se o izvršnoj datoteci (zadnji minus u grupi rw-). Svi članovi grupe mogu samo čitati datoteku i ne mogu pisati u nju niti ju izvršavati (jer je r-- dozvola) i, konačno, ostali mogu kao i članovi grupe samo čitati datoteku.

Treba primjetiti kako se može desiti da neka datoteka ima, primjerice, sljedeće dozvole: ---rw-r-x. U ovom slučaju, vlasnik datoteke nema nikakva prava nad datotekom (njegova grupa dozvola je ---), grupa može čitati i pisati, a svi ostali mogu čitati i izvršavati datoteku. Korisnik ako pokuša pisati u datoteku, ili je izvršavati, neće mu biti dozvoljeno. Dakle, za vlasnika se gledaju isključivo prva tri bita, za grupu druga tri bita i za sve ostale treća tri bita. Nema prenošenja od grupe na ostale, ili od korisnika na grupu pa na ostale.

Još jedna stvar koju je bitno istaknuti je da direktoriji imaju drugačije značenje bitova:
  1. Bit r označava da se može direktorij čitati, odnosno, da se može dohvatiti popis datoteka u direktoriju.
  2. Bit w označava da se u direktorij mogu dodavati nove datoteke.
  3. Bit x znači da se može pristupati metapodacima pojedine datoteke (metapodaci su veličina, blokovi na disku, itd).
Najčešći mod za direktorij je rwxr-xr-x, odnosno, rwx------ u slučaju restriktivnijih postavki. Međutim, ima i sljedeća varijanta: rwx-----x. Primjetite da se ostalima nalazi postavljen bit x, ali ne i bitovi r i w. To znači da ostali ne mogu čitati sadržaj direktorija, niti mogu mijenjati sadržaj direktorija, ali ako znaju datoteku unutar direktorija onda joj mogu pristupiti. To koriste Web poslužitelji za pristup osobnim Web stranicama korisnika.

Neke korisničke naredbe za upravljanje temeljnim dozvolama


Dozvole se mijenjaju koristenjem naredbe chmod(1):
chmod <dozvole> <datoteke i/ili direktoriji>
Prvi argument, dozvole, može biti zadan simbolički ili numerički. Kada se zadaje simbolički onda se govori koje dozvole se dodaju (ili oduzimaju) kojoj grupi korisnika. Primjerice, o+rx daje pravo čitanja i izvršavanja svima ostalima. S druge strane, ako se koristi numerički zapis, onda se zadaju apsolutne vrijednosti bitova pristupa. Primjerice, 700 će postaviti rwx------ dozvole, ili 755 će postaviti rwxr-xr-x.

Korisnici i grupe u Linux operacijskom sustavu...

Pojam korisnika (engl. user) i grupe (engl. group) u Linux operacijskom sustavu su preduvjeti za shvaćanje načina na koji OS razdvaja i kontrolira svoje korisnike i njihove objekte (procese, datoteke, dijeljenu memoriju i slično).

I grupa i korisnik na Linux OS-u (a i na bilo kojem drugom OS-u) trebaju biti kreirani kako bi ih OS poznavao. To se radi prilikom instalacije operacijskog sustava, ili ih administrator kreira prema potrebi. U trenutku kreiranja bilo korisnika bilo grupe administrator određuje korisničko ime (primjerice sgros) odnosno, ime grupe (primjerice zavod). Za korisnika, administrator mora još minimalno odrediti lozinku dok se u slučaju grupe obično još definiraju članovi grupe.

Pohranjivanje informacija o korisnicima i grupama


Popis svih korisnika na Unix operacijskim sustavima nalazi se u datoteci /etc/passwd, dok se popis svih grupa (i članova grupa) nalazi u datoteci /etc/group. Ovo vrijedi pod uvijetom da se ne koriste nekakve napredne metode autentifikacije i autorizacije u kom slučaju korisnici mogu biti definirani i na drugim mjestima.

Međutim, treba imati na umu da što se operacijskog sustava tiče, ime grupe, odnosno korisnika nisu bitni, ono što je bitno su njihovi identifikacijski brojevi. U slučaju korisnika to je UID (engl. User IDentificator) a u slučaju grupe GID (engl. Group IDentificator). Alati koji manipuliraju korisničkim imenima i grupama automatski rade translaciju iz simboličkog imena u numeričko i obratno, ovisno o potrebi.

Primjer nekoliko redaka iz datoteke /etc/passwd:
$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
sgros:x:5056:1000:Stjepan Gros:/home/zavod/sgros:/bin/bash
Svaka linija definira jednog korisnika, a linije se sastoje od polja. Polja su međusobno razdvojena dvotočkom i svako polje ima točno određeno značenje:
  1. Prvo polje je korisničko ime, odnosno, simboličko ime
  2. Drugo polje je lozinka, međutim, iz sigurnosnih razloga lozinke su smještene u zasebnu datoteku kojoj je ograničen pristup.
  3. Treće polje je  UID, pa primjerice vidimo u gornjem primjeru da je prvi korisnik privilegiran, dok drugi nije.
  4. Četvrto polje je primarna grupa korisnika, i to zadana putem GID-a. Naime, svaki korisnik ima svoju primarnu grupu koja se definira u samoj /etc/passwd datoteci, ali korisnik može biti član i drugih grupa, što ćemo vidjeti u primjeru grupa.
  5. Peto polje je tzv. GECKOS polje i informativnog karaktera je. Primjerice, tamo se nalazi upisano ime i prezime korisnika.
  6. Šesto polje je matično kazalo korisnika.
  7. Sedmo polje je ljuska koju korisnik upotrebljava.
Detalje o strukturi passwd datoteke moguće je pročitati u datoteci passwd(5).

Primjer nekoliko grupa iz datoteke /etc/group:
$ cat /etc/group
root:x:0:
wbpriv:x:88:squid,apache
Opet, svaka linija definira jednu grupu, pri čemu se linije sastoje od zapisa (polja) koja su odvojena dvotočkama. Polja su:
  1. Prvo polje je ime grupe
  2. Drugo polje je grupna lozinka. Opet, iz sigurnosnih razloga smještena u zasebnu datoteku. Međutim, grupne lozinke se vrlo rijetko upotrebljavaju danas.
  3. Treće polje je GID
  4. Četvrto polje je popis korisnika članova grupa odvojenih zarezima. Primjetite da ne treba definirati člana grupe ako mu je grupa primarna. U gornjem primjeru, izgleda kao da grupa root nema ni jednog člana, međutim, korisnik root ima definiranu primarnu grupu root i na taj način on je član te grupe.
Opet, detalje je moguće pronaći u odgovarajućoj man stranici.

Kao što je mnogima poznato, na Linuxu se nalazi poseban korisnik koji ima sve ovlati, root. Međutim, preciznije je reći kako Linux posebno tretira UID 0, a ne korisničko ime. To znači da je vrlo jednostavno dodatni novo korisničko ime, podesiti mu UID na nulu i on će imati sve ovlasti kao i korisnik root.

Datotečni sustav


Svaki objekt na datotečnom sustavu ima pridruženog korisnika i grupu. Najčešće, to je korisnik koji je kreirao datoteku te je kao grupa postavljena primarna grupa tog istog korisnika. Međutim, naravno da su moguće i druge kombinacije.

Kada izlistavamo direktorij korištenjem naredbe ls(1) dobijamo odmah informaciju koji korisnik i grupa su vlasnici datoteke:
$ ls -l mozilla.pdf
-rw-r--r--. 1 sgros zemris 40360 Vel 27 21:30 mozilla.pdf
U navedenom primjeru, vlasnik datoteke mozilla.pdf je korisnik sgros, dok je grupa kojoj datoteka pripada zemris. Te dvije informacije su bitne kod određivanja prava pristupa datotekama i direktorijima (o čemu ću u zasebnom postu).

Neke korisničke naredbe za rad s korisnicima i grupama


Kako bi neki korisnik saznao svoj UID, GID i kojim grupama pripada, koristi se naredba id(1):
$ id
uid=5056(sgros) gid=1000(zemris) groups=1000(zemris),959(davfs2),968(wireshark)
Obratite pozornost da ovo nije cijeli ispis, postoji još jedan dio (context) koji je vezan uz SELinux, a koji namjerno ignoriram. Uglavnom, naredba mi kaže kako je moj UID 5056 (i korisničko ime sgros vezano uz taj UID), primarna grupa mi je zemris (čiji GID je 1000). Konačno, član sam i grupa zemris, davfs2 i wireshark.

Promjena vlasnika i grupe nekakvog datotečnog objekta (odnosno direktorija, datoteke) obavlja se korištenjem naredbi chown(1) i chgrp(1). Međutim, s obzirom da se na taj način otvaraju mogućnosti zloupotrebe, navedene naredbe može koristiti isključivo administrator sustava.

Wednesday, July 18, 2012

Research paper: "Lessons from the PSTN for Dependable Computing"

I came across this paper while reading about self-healing systems. The authors of the paper (Enriquez, Brown, Patterson) are doing analysis of FCC disruption reports in order to find out the causes of faults in PSTN. Additionally, PSTN is large and complex networks and certainly experiences from maintaining this network can help a lot in maintaining Internet infrastrcture.

I'll emphasize the following key points from this paper that I find interesting:
  • PSTN operators are required to fill disruption report when 30,000 people are affected and/or the disruption is longer than 30 minutes. There is a screen shot of report in the paper, even though it probably can be downloaded from FCC site. But, it seems that reports themselves are not publicly available?
  • They analyzed reports from year 2000. There is a reference in the paper with older, similar, analysis.
  • They used three metrics for comparison: number of outages, customer minutes and blocked calls. Number of outages is a simple count of outages, customer minutes is a multiplication of duration and total number of customers affected (disregarding the fact if they tried to make a call during disruption). Finally, blocked calls is a multiplication of duration and number of customers that really tried to make a call during disruption.
  • The prevailing cause of disruption is human error, more than 50% in any case. Human error is further subdivided into those made by persons affiliated in some way with the operator and those that are not. Those affiliated with the operator are cause of a larger number of disruptions.

ASLR to extreme

I was reading about Artificial Immune Systems (more about that in another post) and in one of the papers the statement was that biological systems increase resiliency by diversity. Furthermore, they give a contra example in computer networks in which Internet Explorer (at the time the paper was written) had 90% market share. It's obvious that when something hits IE, it hits almost the whole Internet. This isn't diversity by any standard.

I think that we have such problems with security in general that we need some new, radical solution. Probably, we are long way from that solution, but it occurred to me that this is exactly what is necessary, diversity that will disallow attackers from influencing single computers and thus large parts of the Internet. Still, it is hard to expect there will be N producers of operating systems, then N of browsers, etc. It's not easy to produce those, it takes long time and huge resources. Now, biological systems are much much older and theoretically it could be that in some distant future there will be such diversity. IMHO, this is questionable, and as I said it's theoretically in some distant future, which is why it is beyond the point. What we need is something that works now.

If you think a bit what we need is a mutation, that will change computer systems, from the bottom up in unpredictable ways. On the bottom I'm thinking about parts of a single application, while on the top I think of the complex systems consisting of computers and networks. Furthermore, this mutation has to be specific to each system so that there are hardly two similar systems in existence. So, for example, the computer you work on isn't similar to any other computer in use, and, as you use it, it evolves and mutates.

Now, why I mentioned Address Space Layout Randomization (ASLR) in the title? Because it seems to me to be a step in the direction of totally mutating everything. Namely, ASLR mutates address space of the process thus making it unpredictable for attackers and making each systems different. This mutation unfortunately, is restricted because it is too coarse grained, i.e. you move whole libraries, but not functions, of even blocks of the code from which functions are built.

Of course there are problems. For a start, similarity is a key to maintenance of systems. Companies having a large number of computers try hard to make them equal, just to lower maintenance costs. Not only that, developers count on similarity to be able to reproduce bugs, and consequently to correct them. So, those requirements should either be kept in a new system (which in part is contradictory) or new ways of achieving the same effect (i.e. maintainability).

Finally, mutation has to be dynamic. Namely, even if attacker gets into one system, or part of the system he needs time to discover other parts of the system. If mutation is quick enough, the knowledge that attacker obtains will be worthless before he manages to use it. Not only that, but potentially what he already achieved will evaporate soon.

Biseri naših neukih novinara 4...

Eto, upravo sam saznao za jedan novinski članak u jednim našim uopćenesenzacionalističkim novinama, u kojima se tvrdi kako je esej jednog građana RH oborio Google s nogu! Nisam siguran da li se je u međuvremenu Google oporavio od tog šoka i uspio ponovo ustati, ali ja svakako nisam i mislim da će mi trebati još dosta vremena da dođem k sebi.

Ostale postove iz ove "serije" možete pronaći ovdje.

Tuesday, July 17, 2012

Zimbra log cluttering...

When you run Zimbra, logs generated by it are duplicated in system log files (i.e. /var/log/messages, or /var/log/secure) but also in Zimbra specific log files (i.e. /var/log/zimbra.log). The problem with this is that it clutters system logfiles, i.e. takes unnecessary space and makes them hard to analyze. So, it would be good to make Zimbra log only in its own specific log files.

Googling for solution I found this post but with no satisfactory solution. Since there was no ready solution, I turned to Googling on how to configure rsyslog to do that. Namely, CentOS (on which I'm running Zimbra) uses rsyslog as a replacement for a more traditional syslog. It turns out it is possible to filter according to application doing logging.

So, two offending applications are zimbramon and zmmailboxdmgr. In order to prevent them from logging into /var/log/messages add the following lines before section RULES (that resemble classical syslog rules):
if $programname == 'zimbramon' then /var/log/zimbra-stats.log
& ~

if $programname == 'zmmailboxdmgr' then /var/log/zimbra-stats.log
& ~
The first two lines will redirect zimbramon messages, while the second two will do the same for zmmailboxdmgr.

Plaća i sindikati...

Puno se zadnjih dana priča o snižavanju plaća i/ili ukidanju povlastica u javnom sektoru. I doista, kada se malo prate različite rasprave, odmah postaje očito kako su stvari jako polarizirane. S jedne strane, zaposleni u javnom sektoru (a tu računam i školstvo) se bune da se njima ne smije smanjivati plaća jer im je već ionako smanjena, a i oni obavljaju bitnu djelatnost bez koje država ne bi mogla funkcionirati. S druge strane imamo ljude koji rade kod privatnika (u privatnom sektoru) i nemaju nikakve sindikate i skoro nikakva prava u odnosu na javnu službu te im je tragično kad čuju kakva prava i povlastice imaju oni zaposleni u javnom sektoru. I odmah počnu frcati iskre.

Istina je, kao i obično, negdje na sredini, ali je i istina da je ideja i životni cilj velikog broja Rvata da se uglave u državnu službu i ne rade ništa!

Ja radim u visokom školstvu, što znači da sam efektivno na grbači države. Ne samo to, smatram također znanost i visoko školstvo - općenito kao i školstvo - bitno za budućnost države, na nešto što se treba paziti i što treba pažljivo razvijati. Međutim, kad se uzme u obzir što se sve računa pod visoko školstvo u HR, mislim da su plaće više nego dobre. Ne samo to, već su i redovite i radno mjesto je manje/više sigurno (s iznimkom nastavničkih i suradničkih mjesta na sveučilištima, ali to ću za sada ignorirati)! Također, potpuno se slažem da prosječna plaća treba biti vezana uz BDP ili prosječnu plaću u državi, pa ako država ide u bananu sa svojim privatnim sektorom koji u stvari generira dobit, onda trebaju ići i zaposleni u državnim službama u istom smjeru. Naravno, ako državi raste dobit, onda i njenim zaposlenima treba rasti plaća.

S obzirom na to, nikako se ne slažem s Ribićem oko njegova inzistiranja u pregovorima s Vladom oko nekakvih uvijeta vezanih za plaće i/ili nekakve dodatke. Zašto? Pa zato što mi se čini da Ribić (kao i općenito naši sindikati) žive u  nekim prošlim vremenima i nemaju ama baš nikakvu viziju što žele postići.

A što je problem? Pa problem je sustav koji je u totalnoj banani! Mogao bi listati probleme i probleme sustava, pogotovo ovoga u kojemu ja radim, ali to vrijedi za bilo koji državni sustav. Ti sustavi su tako posloženi da bez obzira radi li netko ili ne radi, tako mu je svejedno. Ako neki službenik primjeti problem, neće se patiti da ga ispravi iz jednostavnog razloga što će samo nailaziti na probleme, gubit će živce i na kraju vjerojatno ništa neće postići. A ako stvari krenu krivo može izgubiti i dragocjeno radno mjesto. Ako netko ima ideju, neće ga se motivirati da ju ostvari, u biti, ne motiviraju se niti ideje. Mogao bi sada nabrajati tako u nedogled, ali mogu sažeti vrlo jednostavno taj dio problema: Nema sustava nagrađivanja i kažnjavanja. Dodatno, u državnoj službi ne rade vrhunski stručnjaci. Ne rade iz jednostavnog razloga što mogu otići u privatni sektor, ili van, i imati više nego odlične plaće i gdje će ih se puno više cijeniti! Tek entuzijasti koji (još) nisu naučili da je sve (ili bar puno toga) u materijalnome ostaju u tom sustavu.

Dakle, to su problemi koje treba rješavati! A ne plaće koji su samo vrh ledenog brijega. Kada bi bio sustav nagrađivanja, oni koji su dobri mogli bi i dobro zaraditi, i mislim da se nitko ne bi bunio što je prosječna osnovna plaća mala. Ali eto, naši sindikati vide samo kako "boriti" za 500kn više/manje i to je to. Što meni vrijedi 500kn više na plaću kada je sustav takav da curi na sve strane?

Tragedija je što imamo političare koji se biraju po podobnosti, a ne po stručnosti. Sindikati su paralelna struktura koja bi mogla biti protuteža tome, ali kako kada su isti, kada se isto biraju po podobnosti i po neradu?!

Uglavnom, uopće me nije briga za te pregovore jer znam da se ništa neće promijeniti niti sad, niti u budućnosti.

Saturday, July 14, 2012

VMWare Workstation DNS server...

I just figured out very interesting thing about DNS server used by VMWare virtual machines. It uses /etc/hosts file on host machine to resolve names. The trick is that those names have to have .localdomain suffix. So, for example, if you have the following entry in /etc/hosts file of host machine:
1.1.1.1        test
and then you ask with nslookup within guest machine for that name, you'll receive:
# nslookup test
Server: 192.168.178.2
Address: 192.168.178.2#53

Name: test.localdomain
Address: 1.1.1.1
Note that localdomain is automatically appended. This is specified with search directive in /etc/resolv.conf of guest machine (there is a line search localdomain or something similar). Otherwise, without localdomain, the name wouldn't be found:
# nslookup test.
Server: 192.168.178.2
Address: 192.168.178.2#53

** server can't find test.: NXDOMAIN
In this example, the dot I appended after the name prevents anything from being appended and only name test is looked for.

To conclude, the rule is simple. If the name to be looked up ends with localdomain, then it is searched within /etc/hosts file of the host operating system.

I discovered this by accident. Namely, I did nslookup within guest machine expecting to receive error message about non-existing host name, but I got good response!? At first, I was confused, and it took me several minutes to figure out what's happening. Well, so the things go when you don't read manuals. ;)

Wednesday, July 11, 2012

Colors in terminal...

I'm using terminal for so long time and I was never thinking about the number of supported colors in the terminal. Probably because I never use them. But, it turns out that the number is very low, only 8 colors. You can check it using the following command:
tput colors
Probably you'll get number 8, for 8 colors. I found out this reading feature list for Fedora 18. Even more interestingly, they mention that MacOS X Lion supports 256 colors in terminal by default.

So, naturally, I decided to try to see if I can turn on support for 256 colors, and it seems to work. In order to have so many colors first you have to define appropriate terminal. That one was easy, it turns out there is terminal definition under the name xterm-256color. So, set your terminal to that value, i.e.:
export TERM=xterm-256color
And you can use 256 colors. :) Ok, but how to check if there are really 256 available colors? One way is to use tput command again:
$ tput colors
256
But how to see what colors are there? With a bit of googling it turns out that background color can be set with the following command:
tput setaf NUM
that command will set background to NUM. So, a small for loop will print all 256 colors:
for i in {0..255}; do tput setab $i; echo -n "    "; done; tput setab 0; echo
Actually, the last tput and echo commands are so that terminal is brought back to working state. :)

Sigurnost Hrvatskih Web stranica...

Bojim se da danas svatko želi biti administrator i imati svoje vlastite Web stranice. To samo po sebi ne bi bilo problematično kada dobar dio tih wannabe administratora doista ne bi i ostvario svoje nakane. Ali eto, ostvaruju ih.

Sigurno se pitate zašto o tome pišem te zašto je to strašno? Pa evo, upravo sam naletio na jedan primjer pregledavajući Zone-H: Web sjedište joomla.upi.geof.hr je bila hackirana 6.4.2012. Na današnji dan, 11. 7. 2012. kada sam naletio na tu informaciju, ta stranica je još uvijek hackirana! Ista stvar je za www.upi.geof.hr koja se nalazi na istoj IP adresi. Uglavnom, čini se da nikoga nije briga!

Kao digresiju mogu navesti da sam prije godinu ili tako nešto slao mail jednom administratoru da ga obavijestim kako mu je provaljeno na Web stranice. Nikada nisam dobio odgovor, a i koliko sam pratio, ta provala nije bila sanirana!

Uglavnom, prvi primjer naveo me je da malo istražim koje su sve stranice u HR hakirane. U biti, samo sam postavio sljedeći upit na Google: "hacked by site:hr" i dobio popriličan broj stranica:
  • http://cib2009.grad.hr/
  • http://www.zamirnet.hr/unija47/
  • http://www.lukoc.com.hr/
  • http://udruga-slijepih.hr/
  • http://www.ffos.hr/katedre/knjiznicarstvo/studij/plan.php?studij=5
    (tu je u igri jedan lijepi SQL injection)
  • http://www.cvjecarna-amalija.hr/
  • ...
Primjetio sam također da je hakirano dosta blogova. I to na bloger.hr njih oko 98000, zatim na blog.hr 22600 i konačno na blog.dnevnik.hr oko 1350. Nije loše, čini se da im software baš i nije najjači, da ne kažem da je jadan. :) Uglavnom, sve skupa u samoj .hr domeni preko 140 tisuća rezultata.

U biti, možda to sve može izgledati bezazleno, ali baš i nije. Naime, na hakirane stranice može se podmetnuti zloćudni kod tako da se posjetitelje zarazi. Nakon što zaraze računala, napadačima se otvara cijeli niz drugih mogućnosti, od kojih je jedan i krađa privatnih podataka. Nadalje, hakirane stranice mogu biti samo prvi korak da se u potpunosti preuzme računalo te da se koristi kao "odskočna daska" za nove napade!

Tuesday, July 10, 2012

Linux and Canon ImageRunner 2520

I just lost two or three hours trying to make this printer work! The problem was that I used wrong drivers. There are several drivers out there, but this one worked! The symptom with the other drivers was that the data went to printer but nothing happened, just like it ended in /dev/null! And, in logs (/var/cups/error_log and /var/log/messages) there was nothing at all! This was unbelievably frustrating.

Additionally, what confused me was the selection of supported printing languages, i.e. PostScript, PCL and PXL (or something similar). I didn't have a slightest clue which one is supported by the printer I have. Naturally that I tried to look if it is written somewhere on the printer. But no luck! I also tried to find a model number, again, no luck. Then I tried to find out that information through Web interface, again, no luck! Is it so hard to write somewhere that information if you already offer different PPD files?!

Especially interesting was driver package CQue 2.0-3 (that was one that didn't work!) that has a GUI to setup a printer. But half of this GUI isn't functional, e.g. you can select only Queue tab (which is actually selected by default). Clicking on Next button nothing happens, still, from the main menu you can select next tab Ports. After that, you can not select anything anymore. I was so impressed with this application that I had a strong feeling that I want to beat the guy who wrote it!

So, after downloading archive of a working driver, and unpacking it, you'll find RPM directory with two files:
cndrvcups-common-2.40-1.x86_64.rpm
cndrvcups-ufr2-uk-2.40-1.x86_64.rpm
Just install them. Then, open URL localhost:631 in Web browser. Start adding printer as usual. I selected AppDirect for communication with a printer (probably LDP should work too). During model selection note that there are PPD files for this exact model (look for Canon iR2520 UFRII LT ver.2.2 (en)), but they are listed after the old drivers for ImageRUNNER printer series (i.e. the original ones that came with CUPS, which, BTW, don't have support for this exact model). Probably, this whole setup process you can do also using GNOME's application for managing printers, but I didn't try. What's more, after all this annoyance, I didn't won't to try anything else!

So, that's it, this nightmare is finally finished for me. Note also that because I need only printing functionality I didn't try to set that up scanner embedded into the printer, but the first impression from a quick look that I had using google is that it isn't going to be easy, if possible at all.

And one more digression for the end. Why it isn't possible to open console in Web interface of a printer and just download printer driver? And when you go to Canon's Web site and select a printer to download driver, you are presented with a HUGE number of options to download. This is madness for me who, supposedly, knows how to administer computers! I won't imagine what a nightmare it is for some person that knows a lot less...

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive