Showing posts with label science. Show all posts
Showing posts with label science. Show all posts

Thursday, May 18, 2017

What is R&D according to OECD

In my previous post I wrote about my personal opinion what is R&D. In this post I'm going to analyze definition given by OECD, which might be argued to be a relevant authority for such topics. OECD produces for decades a document called Frascati Manual which is about collecting and reporting data about R&D. The latest version is from 2015 and that one is used as the basis for this post. The manual, in Chapter 2, describes what R&D is. Basically they say that the properties of R&D activity are (paragraph 2.7):
  1. novel,
  2. creative,
  3. uncertain,
  4. systematic, and
  5. transferable and/or reproducible.
and activity has to satisfy all those properties to be regarded as R&D activity.

Property of the novelty can be correlated with properties 1 and 2 given in the post with my opinion. The following citations are interesting or important from the manual:
  1. In the Business enterprise sector, the potential  novelty of R&D projects has to be assessed by comparison with the existing stock of knowledge in the industry. [paragraph 2.15]
  2. The R&D activity within the project must result in findings that are new to the business and not already in use in the industry. [paragraph 2.15]
Those two citations mean that if you do something that anyone already does, or that anyone can do in a relatively short period of time, than it's not a product of R&D activity.

The property of creativity, i.e. the results of activities are based on original, not obvious, concepts and hypotheses can be correlated with property 2 given in the post with my opinion. The following excerpt is interesting:
An R&D project requires the contribution of a researcher!
This means that whoever is doing R&D has to have trained researches  in stuff.

The property of uncertainty, i.e. it is uncertain about the final outcome, has a direct relation to the property 5 in the post. The difference is that OECD publication claims that there are multiple dimensions to this property:
For R&D in general, there is uncertainty about the costs, or time, needed to achieve the expected  results, as well as about whether its objectives can be achieved to any degree at all. [paragraph 2.18].
Furthermore, there is discrimination criteria between R&D and non-R&D activities:
Uncertainty is a key criterion when making a distinction between R&D prototyping (models used to test technical concepts and technologies with a high risk of failure, in terms of applicability) and non-R&D prototyping (preproduction units used to obtain technical or legal certifications). [paragraph 2.18]
So, the more certain you are that there will be some functionality in the final product, less it is R&D activity!

The systematic property of R&D, i.e. to be planned and budgeted, correlates with property 4 I gave in the previous post. This, also includes keeping records, not only planning.

The final property, i.e. to lead to results that could be possibly reproduced (transferable and/or reproducible) is most interesting and I didn't include it in the elaboration of my opinion. Namely, this requires that the results be published somewhere so that conclusions can be independently verified. Somehow, it seems to me that this is the least frequent property. If nothing else, because the scientific output of companies is very small. Someone can claim here that they are publishing somewhere else, why only scientific output? The point is that under the expression scientific output I"m referring on the way the results are published, not where they are published. In other words, scientific publication includes all the necessary information in order for someone else to test the results.


For the end, just let mi note that there is another important subdivision of R&D according to OECD publication (paragraph 2.9):

  1. basic research,
  2. applied research, and
  3. experimental development.

I'll write about those in some future post.

Thursday, April 13, 2017

What is R&D and why should SMEs have one?

In this post I would like to describe what is R&D. This is a continuation of a more general idea of cooperation between industry, academia and government about what I wrote in the previous post. By describing what is R&D I hope also to answer the other part of the post's title, why SMEs should have one. In doing so, I'm not going to give formal definitions for now but only my opinion, while definitions I'll leave for another post. Before continuing, I must stress again that I'm not an expert on this subject nor I represent my employer. As such, this is purely my opinion which might be completely wrong. That said, obviously, I don't believe I'm wrong in general though I accept that some ideas might not be well thought out.

I'll start by enumerating several intuitive properties I expect from R&D, looking from the perspective of a company having or wanting to have one:
  1. It adds some new value which can be monetized in some way.
  2. The new value should not be easy to obtain.
  3. It is a midterm process.
  4. It is a process that should be done methodologically with clearly defined steps and goals.
  5. There is uncertainty as to whether there will be positive results, or any results at all.
  6. It is a continuous process.
  7. R&D process requires investment.
Note that when we discuss whether something is R&D and whether something produced is a product of R&D then we are not requiring that all properties hold, it is enough for a majority to hold! Now, let me discuss each property in a bit more detail.

First, there is a property that it adds some new value which can be monetized in some way. I think that this one is obvious. Everything company does has a goal of improving profits. Now, maybe more correctly would be to say that everything company does serves it to fulfill its mission. This view of helping fulfill mission actually broadens potential topics that can be covered by R&D since some results that don't necessarily produce money can also be covered by R&D. But, more on to the earth, the company is there to make profits and if it doesn't do that, than it ceases to exist. So, R&D should support this. I will refer to the purpose of directly increasing profits as R&D in a narrower sense, while R&D in a broader sense supports company mission. I could theorize further that R&D in a broader sense is more expensive with less direct ROI and thus more suitable for large enterprises, while R&D in a narrower sense is more suitable for SMEs. Nevertheless, in the following text I'll concentrate only on R&D in a narrower sense unless I explicitly say otherwise.

The next property of R&D is that a new value produced should not be easy to obtain. In other words, if the output of some, supposedly R&D, activity is something that anyone can come up immediately, then it's not the product of R&D activity and probably there is no R&D activity at all. This property is desirable for a simple reason that it helps company keep competitive edge. The more a single company has something that others don't have, the more competitive it is and likely the more successful. But, there is a but. Namely, some outputs of R&D are complex and others are deceptively simple. The advantage of having complex products is intuitively easy to understand by anyone, but the simple ones seek some clarification. Namely, they are indeed simple but the process of generating them isn't simple. You can find examples of such output everywhere. How many times you learned something and your first reaction was: How I did not think of that !? Well, that's because the process to reach it is hard, but the output itself is simple. Now, it is obvious that copying simple stuff is easy and to prevent that patent system was invented.

The third property is somewhat related to the previous one, i.e. R&D is a medium term process. The reason is that short term process is less likely to produce something that fulfills the previous property of not easily replicating results, but other properties are also harder to achieve. In other words, if you invest brief time into development of something, in general you can expect some simple results. On the other hand, having a long-term projects allows one to obtain very good, deep, well thought out results, but in a fast paced world it is entirely possible that the results once obtained, are useless. It also might happen that in the meantime, due to not having any results, company doing R&D fails and vanishes from the market. So, the key is to have a process that is long enough to produce useful results, but not too long for these results to be useless. Finding sweet spot is more of an art than a science.

The fourth property, that the R&D process should be done methodologically with clearly defined steps and goals, basically means that certain steps have to be present. For example, goals or requirements must be defined in order to be able to assess whether the result meets goals set at the beginning or not. Then, there has to be exploratory step of studying existing work, i.e. repeating what already has been done is definitely not something that leads to good R&D, or R&D at all. If nothing else, where is the added value required by one of the previous properties? Even worse, it might happen that the results obtained are worse than what the others achieved, and potentially use. After all, the current state of the art was reached through a lot of investments - in terms of time and money. Not to mention that there is a problem with patents and it doesn't matter if you something copied or invented on your own. If it is patented, you cannot use it without the consent of the patent holder! To continue with steps that should be present in R&D activity, we also have to mention evaluation of proposed solutions. It is mandatory. This can be done by experiments, simulations, etc. The evaluation must be done in a rigorous way so that it is beyond reasonable doubt that proposed solutions do indeed lead to better results. I'll stop here because I intent to write more about this topic in a separate post that deals with an issue of how to establish R&D.

The fifth property is uncertainty as to whether there will be positive results, or results at all. There is a reason why it is called research and development. If it were not so, then it would be engineering. Note that sometimes people mistakenly confuse uncertainty in building a new product that could fail on market with the uncertain results of a research. The two are independent and might interplay in several ways. What we are talking about here is that when doing R&D it might happen that the ideas or goals turn out to be non-feasible. But, this has nothing to do with the fact that if the ideas and goals are feasible, will they be successful on the market or not. Take for example an idea about a system that would allow replacement of programmers. This goal isn't achievable and no R&D activity would be able to produce something like that. But, if it were achievable, it would certainly be huge commercial success. So, care should be taken not to confuse uncertainty of research results with uncertainty of market success.

Finally, the sixth property of continuity of R&D process is something that should be satisfied in order for R&D to be useful. This follows from the ever-changing environment and improving competition. If some company does one-shot R&D this could help the company in a short run, but in the long run there will be no benefit from having R&D.  So, just as company has to continuously adapt to state of the environment, so R&D has to be there to support necessary changes. There is also one additional reason for continuous R&D process. Namely, it is rather expensive to establish R&D process so payoffs are better if R&D is established and allowed to continuously function.

The seventh property I added later, after colleague of mine read the post and commented that the R&D process is expensive. After some thought I decided to rephrase it differently, namely R&D process requires investment. I'm still not certain whether this should be separate property or not because I believe it is implied by combination of previous properties. In the end, I decided to put it as a separate property, just in case. I should clarify this property a bit. Namely, everything is expensive when we talk about activities in the company, but it is outweighed by earnings which are immediate, either direct (e.g. selling a product to a customer) or indirect (e.g. bookkeeping activities). R&D is different in two aspects. First, it requires investment with returns coming only later and in a long run - if there is a result from R&D at all (property 5).

In conclusion, I listed six (seven) properties that can be used to determine if some company is doing R&D or it has something it thinks is R&D. Probably those are not the only properties and if you have any to add (or you think that some of the listed above is not important) please comment and provide your arguments. Anyway, probably not all of the listed properties will be present in many cases of R&D in the companies but as I said in the introduction majority would do. Maybe we can also talk about R&D maturity, i.e. the more properties are present, the more mature process is. But I'll leave this for another post.


Wednesday, April 5, 2017

Cooperation between industry, academia and government

This is a first in a series of post (I hope) that will deal with research and development in small and medium enterprises. The reason for me being interested in this topic will be clear after I describe a bit how I got into this. And before I start, let me clearly state that I'm not an expert for economy, management, or even a question of what science is. Everything I say is my personal view at the moment I wrote the blog posts and has nothing to do with anyone else. Especially it is not official position of Faculty or University.

I work on the Faculty of Electrical Engineering, University of Zagreb. My firm belief is that no university can be successful in a long run without being part of a prospective environment. The vice versa also holds, i.e. local economy can not be competitive and successful without support of a good university and colleges. To give an example that support this attitude, Stanford wouldn't be what it is without a brilliant leadership by Fred Terman who's vision helped create Silicon Valley. In essence he created successful local environment that helped Stanford, and the circle was closed.

Yes, we live in a global, highly connected world, and any student can work where ever she/he wants, the same goes for me. Furthermore, anyone can come to Croatia and work here, at least in principle. I can also cooperate with anyone I wish in the world. After all, just that is supported by EU through different programs, most notably Horizon 2020 which is encouraging EU companies and universities to cooperate. This is good, and necessary, but it is not so perfect for one simple reason, and that is the question who is paying me, and who is paying for education of students coming to my university? The answer isn't so global, it is actually very local. All that is payed by tax payers in Croatia, and tax payers are individuals and companies living and existing in Croatia!

With all that said, I think it is very important for local economy to grow and I must do as much as I can to help local companies grow and develop for a mutual benefit. And more importantly, I think that anyone in Croatia, working in companies or on universities, has to see things in such a way.

Now, we come to the question on how to help? The answer is actually quite straightforward, I should do what I'm supposed to do on the University, i.e. research. The companies should cooperate and contract universities for research in order to become more efficient, to have better and more competitive products and services. The truth is that not many companies have enough resources for research and development. It is a risky and expensive endeavor. So, the companies should rely on University and on EU funding. Namely, University provides research resources and EU with funding takes a part of the risk. Of all the funding available, I'll concentrate on one specific that supports Smart specialization, for several reasons:
  1. I was directly involved in one segment of its preparation.
  2. I'm involved in applications for several projects.
  3. It tries to connect universities and commercial sector.
  4. It isn't meant for large pan-European projects, but projects within a single country. 
Three years ago I was involved in the development of Smart specialization strategy (S3) of Republic of Croatia. This involvement lasted for about two years, a bit less. Smart specialization is actually something defined by European Commission which stated that each country (or region) has to specialize in something in order for the EU to be competitive on a global market in a long term. Of course, specialization has to be supported by the current economy, and obviously, it has to be focused. Now, I'm not aware of what other countries did, nor did I spent to much time searching around, so what I'm going to write is probably specific to Croatia, and even more specific for cyber security (one of the subareas selected for specialization in Croatia is cyber security which, which is where most of my work is done). One of the goals of S3 is to encourage commercial, academic and government sectors to cooperate. This should in turn make commercial sector more competitive.

I'm somehow under the impression that much was talked about S3 while it was developed, but now when the strategy is defined and we have to implement it there are not so much events, if there are any (apart from Ministry of Commerce that actually handles all activities related to S3). For example, I'm not aware of a single round table, workshop, conference or anything else organized by someone concerning S3, how it is progressing, have we learnt something, what can be done better, etc.

In the following posts I want to delve more into the following very important topics:
  1. What is R&D and why would SMEs should have one?
  2. How to have R&D?
  3. How to get ideas on what to R&D?
  4. How I think companies behave with respect to S3, and in general towards EU projects.

Wednesday, September 24, 2014

Anonymous paper reviews and threat of a legal action

I just stumbled on a news story in which scientist claims that his career was severely damaged by anonymous comments on some of his works published on PubPeer. This is very interesting story to follow for several reasons.

For a start, PubPeer is a site for a post publication review. I strongly support such a practice because I believe that everything has to be scrutinized and tested, and it helps authors who can get the best possible feedback, but also helps society in general, too because there is ever increasing problem with scientific ethic. As a side note, I was, and I'm still a big proponent of doing review process in public. That, in my opinion, significantly increases transparency. Anyway, PubPeer fulfils my wishes, but unfortunately for me, it is only concerned with papers from medicine, chemistry and related fields, not from computer science.

In this particular case, the problem is that the author was offered a job on the University of Mississippi, with quite a large annual salary, and for that purpose he quitted his current job. University then revoked the offer and so he lost both the new job, and his current job. Now, he claims that the reason for this are some anonymous negative comments on PubPeer and threatens with a lawsuit asking for identities of those who made those negative claims.

While, as I said, it is very good to have such a site, it doesn't mean that everything should be allowed, more specifically:
  1. Any claims made have to be justified. Unfortunately, anonymity also allows people to make damaging or unjustified claims by being certain that there will be no repercussions.
  2. Unfortunately, negative claim even if not justified casts doubts, so that might be a problem.
  3. In this particular case it is also unknown why the author didn't respond to presented claims about problems in his paper. PubPeer claims they invite first and last author to comment on comments.
  4. Finally, no one should take lightly claims about some paper being invalid, not good, etc. In this particular case, I hope that University of Mississippi verified negative claims and that they didn't take lightly what some anonymous commenters said.
In any case, we'll see what will happen with this particular case.

Friday, December 6, 2013

Modeling a simple system using multi agent simulation environments

Note: This isn't finished yet, but because I'm referencing this post in another post, I decided to publish it.

I'll probably participate in a project whose characteristics were such that I suggested that the best way to proceed was to use multiagent type of a simulation. The problem was that there are many different, and popular, multi agent simulation environments and I had to choose one, that will fit this project's use case the best. More specifically, candidate multiagent simulation environments were MASON, Netlogo and Repast, among others, that were constantly mentioned on the Internet and I decided to evaluate them. Note that there are others, too. Lists of available software can be found here, here, and here. But, if you google a bit, you'll probably find many others.

In any case, the requirements I had in mind when starting evaluation process were:
  1. Free licence. Preferably BSD like license, but LGPL, or even GNU, is OK.
  2. GUI that will allow easy experimenting with model.
  3. Ability to model agents with very complex behavior.
  4. Ability to do distributed simulations is definitely a big plus.
  5. NOT exclusively Microsoft based, i.e. C# or something similar.
To be able to better evaluate those tools, I set my self with a task of implementing something simple in the three different multiagent environments (MASON, Netlogo, Repast) and trying to determine which one will best suite my needs with respect to requirements. Note that there are already existing comparisons, but I wanted to gain some first hand experience in how it is to use them. So, in order to do that I modelled the following system in each one of them and recorded my experience in a due course:
The system consists of N identical agents performing some task emulated by using sleep or similar statement/function. Task processing by an agent has an exponential distribution with average processing time of 30 minutes. New tasks arrive according to Poisson distribution with average of one task each 45 minutes. It is necessary to determine average time each task spends in a system and average time waiting in a queue for processing.
For a start I'll set N to 1. So, note that this is a simple M/M/N queue. I'm going to complicate it a bit in a due course, but this is what I'm going to start with. The reason why I choose M/M/1 queue is that I'm able to compare simulation results with calculations.

The posts describing use of specific environments are:

  1. Mason
  2. Repast
  3. NetLogo

While searching for the tutorials, examples and documentation about those simulation environments I wished to try, I found a lot of useful resources. Here are some:

  1. Open Agent Based Modeling Consortium
  2. Comparison of many more agent simulation environments using a single scenario
  3. Agent Based Modeling - a site with lot of resources



Friday, November 29, 2013

Modeling a simple system in Mason...

In this post I'm describing how to implement a simple agent model in Mason multiagent simulation environment. See introductory post for additional details about this endeavour.

Installing Mason

Mason installation is easy. Just download the newest archive and unpack it somewhere on the disk. That's all that has to be done. In the following text I'm referring to this unpacked installation, and anything done is done within that directory. It doesn't have to, but it is easier for a start.

Running simulation

The next thing is how to run Mason simulation. But it turns out to be easy. As an example I'll show you how to run Tutorial2 example. This example simulates Conway's game of life and has a GUI that can be used to control the simulation. So, go to the directory where you unpacked archive that you've downloaded in the previous step and then enter sim/app/tutorial1and2 subdirectory. Java file is already precompiled, but nevertheless, we'll compile it again because it is easy and instructive. To compile Tutorial2 issue the following command:
CLASSPATH=../../../jar/mason.17.jar javac Tutorial2.java
Note that Mason framework is in mason.17.jar and that you have to specify it to Java compiler using CLASSPATH variable. The previous command shouldn't give you any messages. To run compiled example, issue the following command:
CLASSPATH=../../../jar/mason.17.jar:. java sim.app.tutorial1and2.Tutorial2
All in all, compiling and running models built using Mason framework is relatively straightforward.

Evolving the target system

The idea I'll pursue in this section is to gradually build a simulation system. The simulation system will be represented by one class that will instantiate and control all the other classes. Those other classes I'll call agents. There will be an agent that represents a job, one for server(s) and one for a queue that will hold jobs until the server is free to take them.

The simplest possible simulation

We'll start with the simplest possible simulation in Mason, and that is the following one:
package hr.fer.zemris.queue;

import sim.engine.*;

public class QueueSystem extends SimState
{
    public QueueSystem(long seed)
    {
        super(seed);
    }

    public static void main(String[] args)
    {
        doLoop(QueueSystem.class, args);
        System.exit(0);
    }
}
To compile it you have to place it into hr/fer/zemris/queue directory (corresponds to package statement at the beginning of the source). I'll assume that this directory is in the mason's toplevel directory. The name of the Java file has to be QueueSystem.java. In order to compile it, issue the following command:
CLASSPATH=jar/mason.17.jar javac hr/fer/zemris/queue/QueueSystem.java
and run it in the following way:
$ CLASSPATH=jar/mason.17.jar:. java hr/fer/zemris/queue/QueueSystem
MASON Version 17.  For further options, try adding ' -help' at end.
Job: 0 Seed: -1713501367
Starting hr.fer.zemris.queue.QueueSystem
Exhausted
Don't forget that dot at the end of the CLASSPATH variable's value, or else, you'll get an error about being unable to find a class.

This simulation is a very simple one and, as expected, it doesn't do anything useful. All it does is call doLoop method of SimState class which will instantiate QueueSystem object. In our case, we didn't specify anything for the simulation, so nothing happens.

In the following text this simulation will be extended so that it create and coordinate other agents.

First agent

Ok, let's create an agent. Our initial agent will, again, be very simple. It will only print it was instantiated, but nothing else. So, here it is:
package hr.fer.zemris.queue;

import sim.engine.*;

public class Server implements Steppable
{
    public Server()
    {
        System.out.println("Instantiated one Server");
    }

    public void step(final SimState state)
    {
        System.out.println("step() method called");
    }
}
Note that we have to define step() method, because it is required by Steppable interface. But, for the moment, it doesn't do anything.

Ok, to compile this agent, use the usual command:
CLASSPATH=jar/mason.17.jar javac hr/fer/zemris/queue/Server.java
Again, I assumed that you are positioned into mason's root directory, the agent is placed within hr/fer/zemris/queue directory and it is called Server.java.

Note that you can not directly run agents, at least not in this form (i.e. without main method). So, we'll instantiate and schedule execution of our agent in the main class that represents the whole simulation. The change is simple, in the class QueueSystem.java add the following method:
public void start()
{
    super.start();

    Server server = new Server();
    schedule.scheduleOnce(server);
}
Now, recompile QueueSystem.java class, and run it:
$ CLASSPATH=jar/mason.17.jar:. java hr/fer/zemris/queue/QueueSystem
MASON Version 17. For further options, try adding ' -help' at end.
Job: 0 Seed: -1710667392
Starting hr.fer.zemris.queue.QueueSystem
Instantiated one Server
step() method called
Exhausted
Note the lines in bold. First line is printed when constructor of our simple agent was called. The second one is outputted when agent's step() method was called. Note that step method was called only once, and that is because we used method scheduleOnce() that schedules a single occurrence of an event. Try to change scheduleOnce() into scheduleRepeating() and see what will change.

There is also a question of when this event was called. We used a simple version of schedule methods that schedule execution 1 time unit in the future, i.e. in getTime() + 1.0. Well, at least documentation says so! Try to check it by youself. Hint: to get current time in agent's step() method use state.schedule.getTime() method.

Creating jobs

Jobs are a bit different. They are not created at the start of the simulation, but instead are created dynamically according to Poisson distribution. So, what I'm going to do is to create class named JobFactory that will create Job. Each job will be represented using the following class:
package hr.fer.zemris.queue;

import sim.engine.*;

public class Job
{
    public double createTime;
    public double processingTime;
    public double finishTime;
}
Note that job isn't agent! It doesn't have step() method neither it's subclassed from some Mason's class. What I decided is that Job class will only have fields to keep statistical data and that's it.

To create jobs, I written JobFactory agent. Here is the agent:
package hr.fer.zemris.queue;

import sim.engine.*;
import sim.util.distribution.*;
import ec.util.MersenneTwisterFast;

public class JobFactory implements Steppable
{
    private Poisson poisson;
    private Exponential exponential;
    private QueueSystem queueSystem;

    public JobFactory(double lambda, double mu, QueueSystem qs)
    {
        MersenneTwisterFast randomGenerator = new MersenneTwisterFast();
        poisson = new Poisson(lambda, randomGenerator);
        exponential = new Exponential(mu, randomGenerator);
        queueSystem = qs;
    }

    public void step(final SimState state)
    {
        double currentTime = state.schedule.getTime();
        double nextEventTime = currentTime + poisson.nextDouble();

        Job job = new Job();
        job.createTime = currentTime;
        job.processingTime = exponential.nextDouble();
        queueSystem.pushNewJob(job);

        state.schedule.scheduleOnce(nextEventTime, this);
    }
}
So, how this JobFactory agent works? First, we have a constructor. Constructor instantiates two classes, Poisson and Exponential, that will be used to generate random numbers from respective distributions. The first two parameters of the constructor define distributions' mean values. The third parameter is used for sending newly created jobs into a system queue.

Note that, apart from generating new jobs according to the Poisson distribution, we also have to specify for how long will a single job be processed within the server. I think that a natural place to determine this is when the job is created since it is the characteristic of the job itself.

I thought about sending Job objects directly to the server agent. But the problem with that approach is that server has to schedule itself in case there are no other jobs waiting, i.e. the job immediately enters server. Namely, server has to wake up when some job is finished and remove it from the system.

But, in order to be able to do scheduling I had to have access to SimState object, which is accessible only from step() method. Now, I could save this object internally, but it would be a hack. Namely, I would have to somehow provoke step() to be executed immediately at the beginning. Oh, yeah, I could send SimState object via constructor. But in the end, I gave up from pursuing this approach as I haven't been able to find someone else already doing this (nor in the examples directory, nor on the Internet).

The second part of the JobFactory class, and the its workhorse, is the method step(). What this method does is create a new Job class initializes its processing time (job.processingTime) and adds it to the queue of jobs waiting for the server (via call to the method queueSystem.pushNewJob). Finally, this method draws new random number for the Poisson distribution which defines when a new job will be created. It schedules itself at that point in time.

Ok, our simulation class, QueueSystem, has to have a method for accepting new jobs. This method has name pushNewJob, and the code is the following:
public void pushNewJob(Job job)
{
    jobQueue.add(job);

    if (jobQueue.size() == 1)
        schedule.scheduleOnce(schedule.getTime() + job.processingTime, server);
}
jobQueue is a linked list, i.e. FIFO queue, that is used to hold jobs while being processed in Server and waiting for the Server. The job that is in front of the queue is the job that is currently processed by Server. Maybe I should have written code a bit differently, i.e. so that Server holds the job it processes in some internal attribute, but I did it this way and I didn't bother to rewrite it.

Apart from adding new job to a queue there is one additional thing I had to do. In case there is no job in queue, that means the server is idle, and it is not scheduled for the execution! So, the if statement checks this condition, and if the server is idle it schedules its execution when jobs is finished! Otherwise, server will execute at some point and it will take next job and schedule itself. We'll come to that part a bit later.

One more thing hasn't been specified with respect to QueueSystem, namely jobQueue and activation of JobFactory. Server isn't activated until there is a job, and that is handled by pushNewJob method.

So, in order to take care of that case, here is the new start() method of QueueSystem simulation/class:
public void start()
{
    double alpha = 3;
    double beta = 5;

    super.start();

    jobQueue = new LinkedList<job>();

    server = new Server(jobQueue);

    jobFactory = new JobFactory(alpha, beta, this);
    schedule.scheduleOnce(jobFactory.getFirstInvocationTimeStamp(), jobFactory);
}
So, what's going on in this method. There are alpha and beta parameters for M/M/1 queue. Next, I'm initializing FIFO queue, jobQueue. It's defined as follows as a QueueSystem's class atribute:
Queue<job> jobQueue;
Then, server agent is instantiated. Note that I'm sending queue to server. That is necessary since server has to take jobs from a queue. I'm also instantiating JobFactory agent. Finally, I'm scheduling initial run of JobFactory.

There is a small probelm. Namely, I have to schedule first invocation according to Poisson distribution. It is not correct to invoke it immediately, at least not in the form I wrote it. And, this class, QueueSystem, doesn't have access to poison distribution in order to get first random number. It would be also error to create another Poisson distribution. So, I added a method to JobFactory class/agent that will return me first random number. It is the following method:
public double getFirstInvocationTimeStamp()
{
    return exponential.nextDouble();
}
and you should place it in JobFactory agent/class.

Ok, the final piece of puzzle, Server agent. First, constructor is now a bit different, namely, it has to take queue reference:
public Server(Queue jq)
{
    jobQueue = (LinkedList)jq;
}
step() method is also a bit more involved:
public void step(final SimState state)
{
    Job job = jobQueue.remove();
    job.finishTime = state.schedule.getTime();

    jobs++;
    systemTimeAvg = systemTimeAvg + (job.finishTime - job.createTime - systemTimeAvg) / jobs;
    jobNumberAvg = jobNumberAvg + (jobQueue.size() - jobNumberAvg) / jobs;
    currentStep++;
    if (skipSteps == currentStep) {
        System.out.println(systemTimeAvg + " " + jobNumberAvg);
        currentStep = 0;
    }

    if (jobQueue.size() > 0) {
        job = jobQueue.peek();
        state.schedule.scheduleOnce(state.schedule.getTime() + job.processingTime, this);
    }
}
What does this method do? First, it pops a job from the front of the queue, the job that was processed within the server. Then, it updates and prints some statistics. Finally, it checks if there is another job in the queue, and if it is, it schedules invocation of itself when that particular job has to finish.

Basically, that's it.

Tuesday, February 5, 2008

DDoS attacks, Internet, new Internet and POTS...

I was just thinking about many initiatives (e.g. GENI) to design Internet from scratch! It certainly requires us to break out from the current way of thinking, that's with us for about 40 years now, and to find and propose something new. The good example of this break through was the Internet itself, i.e. the concept of packet switched network. As a side note, Van Jacobson has an idea of how this new might look like and I recommend the reader to find his lecture he held in Google on Google Videos.

While thinking about what is this "new" thing, I took as an example DDoS attacks. There are no DDoS attacks in POTS and they are a big problem for the Internet. So, how this new mechanism should work in order to prevent DDoS attacks. The key point of DDoS attack (or more generally, DoS attack) is that there are finite resources that are consumed by attacker and thus, regular users can not access those resources, they are denied service.

And, while I was thinking about it, I actually realised that there is DDoS attack possibility in the POTS as there are also finite resources. Ok, ok, I know, I managed to reinvent the wheel, but hey, I'm happy with it. :) So, if possible, why there are no DoS attacks in telephony? The key point is that end devices in POTS are dumb and thus, not remotely controllable. If they were remotely controllable, then the attacker would be able to gain access to them and to use huge number of those devices to mount an attack on selected victim. Maybe this attack would be even more effective than the one on the Internet since resources taken by end devices are not shared even though the end devices don't use them.

It turns out that DDoS attack is actually a consequence of giving more power to the user via the more capable end devices. Furthermore, because those end devices are complex systems it's inevitable that there would be many ways of breaking in and controlling them.

Of course, someone might argue that the problem is in ease with which IP packets can be spoofed. But, this is actually easily solvable, at least in theory, if each ISP would control it's access network for spoofed addresses. The more serious problem is actually DoS attack made by legitimate IP packets. It is traceable if coming from a single source, or small number of sources, but the real problem is a network of compromized hosts (botnets). There is no defence from those networks as they look as legitimate users.

So, because we are limited with real world and we'll always have only finite resources on our disposal it turns out that the only way of getting rid of DDoS is to restrict end devices, which by itself is impossible. Now, this is thinking within current framework. But, what if we can made finite resource apparently infinite, or somehow restrict end devices.... This is something for further thinking...

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive