Code Monkey

Friday, July 8, 2011

Pointers with *, &, ->, .

This is one of those things that never gets old; discussing pointers. With my heavy C background and my insatiable desire to program close to the hardware; I would say that I love pointers. But most people, when confronted with pointers, tend to loose interest and also their minds trying to understand what they mean. So here, I'm going to try my best at explaining pointers in a way that makes sense; for this I'll be explaining pointers from the reference of C.

First let's cover some background on variables, definition is pretty straight forward. Below you can see how to create an int variable.

int  x;

So let's get into the thick of it, what is a pointer? A pointer is a variable to a memory location! Wait, that's it? Is it possible that I am misconstruing what a pointer is? Not at all, instead the confusion comes when you begin explaining how to USE them. So let's see how to define a int pointer.

int  *x;

Yes, that is it, I've defined a pointer; the variable is interpreted as such; x is a variable that contains a memory location to an int. But wait, that cannot be it; people tell me that pointers are really difficult to use. Yes, it is a little bit harder, see you have to dereference the variable x to get the integer that is pointed to by the memory location stored in x. Ok, so now your eyes have glazed over right? You've decided to continue on to some other blog and see if someone else can explain it better but STOP and take a deep breath.

Let's look at a better example from our pointer variable x. Let's assume that we have a section of memory that is the memory representation of an int (4 bytes long).

 ---------------
| 4 bytes, woot |
 ---------------

Let's also put some byte numbers to this.

 ---------------
| 4 bytes, woot |
 ---------------
^ 0x1       0x5 ^

Ok, ok so what about our int *x? What the hell does that even mean and what is contained in there? Well, are you ready? It contains the address of our int (the 4 bytes above). So wait, x = 0x1? That's right! But now you have to "dereference" x. So bare with me; you have x which contains the memory location of our int. We need to get THE int. Therefore we need to get the data AT the memory location right? HA, that is dereferencing. So how do we do it?

*x=10;

No, it cannot be that simple... Yes, actually it is! The * tells the compiler to treat the variable as a pointer and to use the current variable as a memory location. So how do we know that it is an int? Aha! Because we defined it as an "int *" which means that inherently when we dereference x, *x must be an int. So now that we understand that the * indicates a reference and is used to dereference a pointer; what the hell is that ampersand '&' used for?

Well, the ampersand is used to indicate "reference of." Wait, I thought that was what * was used for. No, I said * was to indicate a variable is a reference or is used to dereference a reference. The & is used to say the memory location (reference) of a variable. Let's see this from our original variable x (I shall refresh your memory).

int  x;

See how x is just a nice happy variable? Not a pointer! So let's look at some more memory locations. It is important to note that since x is not a pointer, the memory layout below IS the x.

 -------------------
| 4 bytes, oh noes! |
 -------------------
^ x IS here

So when you specify &x, what are you saying? You're getting the memory location OF x! WHAT!?!?! That doesn't make any sense! how can you GET the memory location; yea I know you want to switch blogs again STOP and let's take a deep breath.

Remember that x IS a variable and therefore the variable MUST lie in a memory location. So when you do &x you are merely getting the memory location of that variable.

 -------------------
| 4 bytes, oh noes! |
 -------------------
^ 0x1           0x5 ^

So why would I want to do this? Because in order to create a pointer, it must reference a valid memory location. Check this out.

int  x;
int  *y;
y = &x;

OH MY GOD, MY HEAD EXPLODED! Nah, it's not that bad; until of course I tell you that *y and x are the same variable. If you don't understand that, then check it out, let's say that x is at location 0x1 (taking from our previous example). Then when we get the variable location 0x1 and store it into a variable that HOLDS the memory location (y), then when we dereference (there is that stupid ass word again!) *y we end up with an int at the location 0x1. OH SHIT, THAT IS THE LOCATION OF x!!!! GET SOME!

So what happens to x if I do *y = 22? Right, x = 22! Why is that? Because *y and x are THE SAME VARIABLE! I'll say it again if you want me to but I'm sure you're getting tire of me explaining it; so I'll say this if basic pointers don't make sense, you should start reading this over again (if you don't understand iteration, you will by the end of this).

Fantastic, so now we understand the basics of pointers. But what about those stupid ->'s and when can I use .'s? Ok, we're getting ahead of ourselves. So if you know about structs (if you're doing anything in C I'm making the assumption you've heard of them) but if not let's look at an example.

typedef struct str{
  int x;
}str;

So here we have a basic struct; essentially it's like a class with only public members (and no methods) in Java. If you don't know what a class or a struct is; you shouldn't be messing around with pointers. So let's look at referencing the variable x inside of an example str.

str s;
s.x = 10;

Well that wasn't too bad; you use the . operator to indicate a member of the str struct. But here is the thing; let's say that you wanted s to be a pointer. Why? Well, we'll get to that; but right now just follow (I haven't steered you wrong yet!). Let's assume that our variable s is declared as below.

str *s;

Then we want to access x. Ok, simple right, s.x; no... s->x; oh holy crap you're confusing me! Nah, it's really not that bad, s was a pointer right? So what made you think that you could just access x as if s was the variable!? Oh I get it, s is a pointer, which means we had to dereference it to get x. So that means that -> dereferences s and gets the variable x from the struct! Right! if you wanted to write it (and seem cool) you could do (*s).x but no one would hire you because that would be horrible to read everyday.

So wait, that's all the -> operator means? Yes! See, I told you pointers weren't that bad. But of course, there is the big question; how do I actually USE pointers? Dynamic array sizes. What does that have to do with anything? Easy, the function malloc() which allows you allocate a section of memory. Let's say I want to get an array of 2 integers.

int * x;
x = malloc(sizeof(int) * 2);

If you don't know malloc, you'll need to go check out the documentation but essentially we are just allocating the size of two integers (8 bytes) in memory and getting the memory location. Storing it into a pointer. What does this look like in memory?

 ------------------
| 8 bytes, badness |
 ------------------
^ 0x1          0x9 ^

And of course x holds 0x1. Now, since we allocated 2 ints it is important to note that THERE ARE TWO OF THEM. So how do we get them? i[0] references the first i[1] indicates the second. WAIT, huh? Ok, so i[0] dereferences i and then goes to an offset of 0. i[1] dereferences i and then goes to an offset of 1. So where does this offset come from? Easy, the offset is defined by the SIZE OF THE VARIABLE TYPE! Since our variable type is int, the size is 4 bytes; so i[1] will go to 0x1 + 4 = 0x5.

So can I name another reason to use them? Sure! If you wanted to modify a variable without having to return it. What? That makes no sense. Sure it does, let's say we want to take in a str variable (again stealing the definition from above).

void modify_str(str * s) {
  s->x = 10;
}

OH SHIT! That makes sense, yes! Is that awesome or what!? So here is the cool part; returning pointers. Example, let's say that we want to create a new str, and set the internal x to 1983 (my birth year because I'm picking random numbers).

str *new_str(int  i) {
  str *s = malloc(sizeof(str));
  s->x = i;
  return s;
}

OH DAMN! This makes initializations work! All of this is starting to make sense; why we would need to use pointers, how pointers work, and also how to effectively use pointers. Yes, and it really isn't as bad as many people tend to think. So here is the thing, pointers are pretty straight forward; double pointers and higher become much more abstract.

DISCLAIMER
I am not going to cover double pointers; but since you understand pointers now; just think of them as pointers to pointers (since that is what they actually are). And more importantly, you really won't use double pointers until you start needing to do pointer arithmetic in order to do iterations.

I'll say that I'll cover double pointers at a later time; but I have no guarantees that it will happen. If it does, hopefully I will be as fun and exciting as I have been so far! Just remember, pointers store memory locations of variables.

Wednesday, May 18, 2011

How can you not use <insert favorite IDE here>?!

This is one of those subjects that really bugs me is when people promote an IDE to the point that they are insisting that you are a complete failure for not using that IDE. So I realize that people always have an IDE that they really like. Mine is Eclipse for Java development; however, for C based applications I prefer to write in a text editor (vim is my poison). I have my own reasons, which I will outline below, for why I like these two setups.

Eclipse I like for Java style development, it has some really great auto-complete. However, I also like the simplicity for adding external jars and linking projects together as dependencies. Also there are quite a few good plugins for things like Ivy and Subversion that give tons of extra functionality. Why not Netbeans? I've never been impressed with the auto-complete and the auto-populate for import statements. Also I hate the fact that most class names will be fully qualified which bugs me (this may have been changed in a more recent version but this is how it was in a previous version).

Vim I like for C based applications because I've never found that any IDE that gets out of the way and has a good built in debugger. Frankly when I'm writing in C I really don't need that much help since with Vim I get CTags. This gives me the ability to investigate a function on the fly as well as investigating structs.

I realize that I am putting forth these statements and it sounds like I am breaking the rule that I put forward. Well not quite, I'm not stating that anyone else should work the same way that I work. Instead I am stating the reason(s) why I am doing something. The thing that I cannot stand is when I say "I use Eclipse for Java" and someone else goes "you should really use NetBeans, it's the awesomest and you are an idiot for not using it." I've also heard the argument "well you don't use an IDE anyway (for C development) so why not use something?" Why not? Because I'm more effective in vim, I have very few problems within my type safety so I don't need an IDE checking my type safety everytime I finish a word.

So what exactly is my point? Well, I'm not saying that you shouldn't propose someone using an IDE that you are used to; instead I'm stating that you can propose an IDE, just don't push that idea to death. It annoys me when I am working in a specific environment and doing well with it; and someone comes along and goes "well boys, we're going to change things up and go with this IDE, everyone should now be using it." Get off my back, I do more work in 2 days than you do in a week; maybe you should adopt my environment because it seems to work. Oh wait, you have problems working in my environment? Then why would you push me into an environment that I might have problems working in?

Sunday, April 24, 2011

How much is too much dereferencing?

This is mainly focused in any language where you can have composite classes or structures. Just to explain this, where you have classes inside of classes or structs inside of struct. Consider the following.

C Example with single inside struct


struct inside {
  int i;
};
struct outside {
  struct inside i;
};
int main(int argc, char *argv[]) {
  struct outside o;
  o.i.i = 10;
  return 0;
}

Does this really make sense when we look at

o.i.i

? If it does, let's look if we changed inside like the following.

C Example with deeper inside struct


struct deeperinside {
  int i;
};
struct inside {
  struct deeperinside i;
};

Now, we call the internal i as o.i.i.i which becomes pretty unruly. Obviously using the same variable name each time would be pretty terrible. But even changing the name doesn't always help. Let's see what happens if each element is actually a pointer.

C Example with deeper inside struct pointers


struct deeperinside {
  int i;
};
struct inside {
  struct deeperinside *i;
};
struct outside {
  struct inside *i;
};
...
struct outside *o;
...

So now we reference the int i from deeperinside as o->i->i->i This is completely unreadable, so let's see what happens if we at least add some casting.

C Example Access to deepinside variable


((struct deeeperinside *((struct inside *)(o->i))->i))->i

Although at least you can understand this, it becomes completely unreadable; so what is the answer here? How about we create some temp variables that can be used to store our inner variables?

C Example Access to deepinside variable through temp variables


struct inside *tempinside = o->i;
struct deeperinside *tempdeeperinside = tempinside->i;
tempdeeperinside->i

Notice that this becomes much more readable and understandable. This allows the people who will be maintaining the code much happier and faster to find/fix issues as they may arise. Now I've been focusing mainly in C but this also applies to languages such as Java. Why Java? Because A) it's the language that many programmers are programming in today; and B) it's much easier to encapsulate classes inside of classes (and is usually encouraged). By bringing up the idea of how we deal with accessors to these inside classes we can ensure that people at least understand how they can better display them to make them more manageable.

Another option in an actual OO language would be to encapsulate those objects, this in and of itself means that those o.i.i.i would turn into o.getI().getI().getI() which again makes no sense; are those composite classes (classes that encapsulate a class of the same type)?

Remember, making code reusable is good; making code understandable is much more important!

Friday, April 15, 2011

Do you like programming or YOUR programming?

This is a very interesting question that I've seen come up again and again; although I've never know the best way to express it until now. Out of all the languages listed below which ones can you not stand? If presented with that specific language to keep up; would you do it or would you quit? The question is, do you like Programming or do you like YOUR programming?

Perl


sub Example($) {
  print $_[0] . "\n";
}

Lisp


(defun Example (var)
  (print var))

C/C++


void Example(char *var) {
  printf("%s\n", var);
}

Java


class Temp {
  public void Example(String s) {
    System.out.println(s + "\n");
  }
}

If you looked at any of these languages and you said to yourself "I'd never program in that language" then you've just answered the question. Many of us developers are presented with a code base and are asked to make changes or to fix it. We normally don't have the ability to tell the client "well I can change it to language X in about 5 weeks if you'd like" because the client will say "what do I gain from this?" The answer is; well you get nothing extra but it will be easier to maintain. This then presents you with a big no because the client isn't going to pay you to rewrite an already running application into a language that you want (mainly because someone else will come along with this same statement).

We all have languages that we like; we all have languages that we're good at. This is true, but being scared of a specific language to the point that you would leave a job instead of program in it is just ludicrous. My point is not that you should seek out work in languages that you are not familiar with; but instead be open to working in different languages. The first step to working in an unfamiliar language is to find out how to do things that you would normally do in your known languages. NOTE: I do not condone going out and buying a book on the language and trying to cram it all in! Instead pick up on new ways of doing things as you are programming.

One of the big pitfalls that I've seen is someone coming into a team with programming knowledge. They then try to apply their previous programming experiences to a language that you might not know. Because this is the case, always listen to your teammates on how to better work in the language. Also, if you believe that your way of programming is getting ridiculed and you feel it is better than the way being suggested you should stand up for your way. Now be open to change and listen to why someone is making the argument that your way is bad/inefficient etc... Make sure to get them to explain why their way is better (and yes understandability is a feasible reason for example a .foreach() method rather than for(int...) {}).

So why bring this up? I have a coworker who is a DBA in MySQL who is going to be leaving. They have always been a huge proponent of DB2; always trying to get the entire organization to DB2. Now, the company I work for is not a small 50 person group; I work for a medium 500 person who just got bought out by a 100k person corporation. The problem with pushing these types of changes is that once you grow so large the company is not going to do a massive huge change just because someone feels that they can better manage a DB2 instance. This would require every application we use to be changed from MySQL to DB2 connectors which would be quite a large problem since almost every application we have developed is in C. We already have MySQL libraries that are well tested and work well; so having to switch to DB2 would be a huge change that the company was not wanting to risk.

So now my coworker is leaving because they didn't get their way. So how does this relate to my original example? Well think about this; it's not just dealing in programming where you shouldn't be afraid of working with new languages. Instead, you should always be open to work in new applications and expand your horizons. You come from a completely Windows shop and get hired as a Linux administrator. Don't be afraid, pick up that Linux administration book and install a copy at home. Play with it, break it, fix it, learn it. The way I see it, we as technology professionals should always be willing and eager to learn about new features in the technology field and not just limit our learning to ONLY our specific field. That's right, if your a Linux admin go learn something about networking or Windows administration. If you're a developer, go learn something about networking or host administration. If you're a DBA who works with DB2, go pick up a copy of Oracle or MySQL and see how they compare; maybe you'll find something interesting from that other piece of software that you would like better.

Thursday, April 14, 2011

Unit testing the proper way

Many times I've seen Unit tests that really don't help out. You will often find Unit tests that will quite often be like

public void TestNullInput() {

  MyObject obj = new MyObject();

  obj.functionThatShouldNotFail(null);

}

This really doesn't help that much; now it does need to be done especially in languages such as C or C++. You must always check what happens if you pass a function a NULL pointer. However, many times we see these that people will put in Unit tests for the most obvious tests. But what happens what you end up with the more non-obvious tests? Look at this example below and try to figure out what we're testing for.

public void TestInput() {

  MyString str = new MyString();

  str.parse("Hello|world!"); 

  Assert.assertEquals(str.parsedCharCount(), 

    "Hello|world!".length());

}

Well, it's really not that obvious at first; we just think oh why would we care about checking for the parsed amount of letters. Well here's the thing; let's assume that we're doing parsing and we need to do something special when we hit the '|' character. Let's assume that we are skipping it or something to that effect; why would we be looking for 12? We if we think about C/C++ (not so much managed languages but we'll assume that it might happen in their case as well) then it is very easy to run off the end of the array. So what we do is to count the number of characters that we are parsing; this allows us to check and ensure that we ONLY parsed the number of characters that was possible in the string (the max length of the input string).

The idea isn't just to throw out tests like this in every possible case before declaring that your code is ready. The point of bringing this up is that when you come across an error or a segmentation fault etc... there is always a way to write a Unit test to check for an error. Unit tests provide a way to detect bugs that may have regressed; and because we can detect them when they regress it can provide us a way to fix problems that we may have resent out to the consumer.

Thursday, February 24, 2011

Developing scared software

This is something that I've begin to notice and have a real problem with. Now, it may not be you but you know SOMEONE that has done this. A problem/issue arises or there is a deprecated library call that is being used. It is something that has been working for quite sometime but might be either holding back new features because it is no longer supported; or it's not broken enough to force someone to fix it. This is where the people come in and say "well this needs to be fixed" but they never make any changes to it.

Sometimes this is accompanied by a "it's in a library" or is followed up with "we'll fix it at some point in the future when we have to." So why is this an issue with me? Easy, it's a lame excuse; if you see that there is a problem YOU, as the developer, should be man enough to step up and fix the issue that is starting at you. Yes, I do understand that there are processes and bugs that need to be prioritized; but if you see that it is an issue then you should make the case to find time to fix it.

I've seen this from a few different developers and it kills me when I hear it. "Oh yea, I'd love to replace that." Really? If you would love to do it, then do it. Especially if you see the problem and you know what the acceptable fix is; this means that you've already thought out a plan to fix the issue and as such you should put your fingers to work fixing the issue. Your users may not thank you, but your peers will once they see that you're working to make the product easier (hopefully with your fix) to maintain.

The reason that I call this scared software is that the software is coded, almost on eggshells, in a way that no one really wants to make big changes to it. Thus, to get a new feature (or to perform a bug fix) many people will end up writing tons more code to work around the issues that no one wants to fix. The solution is simple; if you see an issue or see an improvement, do it.

Tuesday, February 22, 2011

Bugs are a part of life; and users will hate you regardless

Something that I consistently have issues with is the support (or lack there of) for development groups by the end users. It is consistent that I hear complaints about products, and of course I'm not just speaking directly towards MY product, that it's "unstable" or "unusable" or that it is a "worthless piece of garbage." Many of these I've heard for software such as Microsoft Windows; or some other piece of software that has a tendency to crash. The product that I am currently working on is a fairly stable product although it has its own issues. Now I can understand and even empathize with crashes that can occur from day-to-day usage of software. People bring up applications such as Apache, to which I pleasantly remind them that Apache does a fork() and creates new processes constantly to handle requests. If a fork crashes, it continues on its merry way; with a GUI based application there really isn't a whole lot to fall back on which is why you see "death" if something bad occurs.

So what do I mean when I say that I empathize? It means, I'm sorry you found an issue and I realize that it is a problem. If you can put in a bug within our bug tracking system, I will be more than happy to track it down and fix it in a future build. Now this ends up with 3 scenarios; the first is the happy scenario where the user creates the bug and you fix it and they are happy. The 2nd scenario is the more likely; something else more pressing comes up (maybe they never crashed in the same manner again, but some other fire is burning and you need to throw some water on it). At which point the user comes back 3 months later and goes, "what's up with this bug? You asked me to put it in and you haven't fixed it."

Now you could have the best come back ever with "well, an issue was found that was deleting everything from our data store and I had to work to get that fixed which cropped up 10 other issues and in each case again stuff was being deleted from random data stores." Your user will still complain that you have not fixed the issue that THEY put in. I've tried this in multiple ways where instead I would create the bug; but that just seems to exacerbate the issue because the user will come back and complain that you didn't enter the information correctly and that the issue THEY saw was not the issue that you KNOW is there.

Now there is the 3rd scenario which is also more likely than the 1st where you fix the issue. Let's say that there is another 2 week ~ 2 months before a release and the user comes back and asks you if the issue is fixed; you will then explain that it is scheduled to be released. The user then walks away feeling cheated because it has been "fixed" but yet they see NO output; only that their bug has been "fixed." The user, from that point on, begins to get weary about submitting bugs since they don't actually see these fixes and by the time there is a release they have already forgotten about it. This is the one that I see the most; and as time goes on, the users usually become disappointed with the software since they consistently see bugs but don't really get instant gratification that the bug was fixed.

It doesn't matter how many bugs you fix; or how difficult the bugs are that you have fixed. You will always be known as the person who fixes the stuff that they broke to begin with. The only real way to help you to save the relationships with the clients is to make sure you are talking with your end users as often as you can. Finding out how the end users are working with the software, determining whether they are having issues with the software and getting feedback. The best way to determine how usable an application is, is to actually use the software. This is extremely difficult in many circumstances if the software that you are writing requires a high skill level to effectively use. The consumers of our product are GCIA certificate Security Analysts; it would be difficult for every developer (who did not have a security background) to actually go and analyze traffic logs.

The other side is to hide bug tracking as much as possible; there are always complaints about the way the software works; but things like crashes should have automated reporting. This shields the consumers from actually having to create the bug and become invested into a bug (whereby they will care about the state of that specific bug) and thus have no expectations of when that bug might be fixed. You can see this during crash reporting in Windows or KDE where we see the crash reporter popup asking if we want to submit the crash report to Microsoft or KDE respectively. This allows the developers to look into the issues from a completely object standpoint rather than having a bug which was written by an angry user who had been writing a document for a client which then crashed right before they saved which says

Hey, your crap software crashed when I was writing my document; thanks for wasting 2 hours of my time! Fix it!