November 14, 2008

Linux commands that help me to create a correct environment for debugging

To begin with, I am rather inexperienced Linux developer. All 4 years of commercial development I have used Windows + Visual Studio only. Building software for many platforms is not common for me. So, I created a list of commands (as well as software) and other hints that help me to debug my software on Linux. When I say Linux, it 75% means Mac Os too.

And yes, I am using Ubuntu 8.10.

  1. Gdb in Ctrl+X+A GUI mode.

    Usually I am debugging code using Netbeans, but there are situations where I need to debug code on remote computer, but installing Netbeans or even set up gdbserver will take too much time. So, I just use Ctrl+X+A keys and gdb switches to simple GUI mode. Very useful

  2. Ldd

    This programs shows library dependencies of some executable file. If the file is not executed for some reason, the first thing we need to check is ldd. In windows the LIB (included with VS) do something like the same.

    gburanov@gburanov-ubuntu:~$ ldd /mnt/G:/Source/project/exe/lx/debug/english/schedmgr.exe
    linux-gate.so.1 => (0xb80b1000)
    libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb806a000)
    librt.so.1 => /lib/tls/i686/cmov/librt.so.1 (0xb8061000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7f71000)
    libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7f6d000)
    libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7f47000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7f38000)
    libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7dda000)
    /lib/ld-linux.so.2 (0xb8097000)
    gburanov@gburanov-ubuntu:~$
  3. fuser

    Shows who owns the file. In windows I always used Unlocker application for it (installed separately).

    gburanov@gburanov-ubuntu:~$ fuser /mnt/G:/Source/project/exe/lx/debug/english/schedmgr.exe
    /mnt/G:/Source/project/exe/lx/debug/english/schedmgr.exe: 17846e
    gburanov@gburanov-ubuntu:~$
  4. dmesg

    Shows you the kernel bootup messages. Used together with grep. Example:

    gburanov@gburanov-ubuntu:~$ dmesg | grep eth0
    [ 4.552705] eth0: registered as PCnet/PCI II 79C970A
    [ 79.763326] eth0: link up
    [ 300.200990] Inbound IN=eth0 OUT= MAC=00:0c:29:d1:4e:30:00:00:aa:9e:33:95:08:00 SRC=10.2.144.8 DST=10.2.148.57 LEN=83 TOS=0x00 PREC=0x00 TTL=30 ID=6445 PROTO=UDP SPT=161 DPT=42372 LEN=63
    [ 300.210026] Inbound IN=eth0 OUT= MAC=00:0c:29:d1:4e:30:00:14:38:da:c4:85:08:00 SRC=10.2.148.62 DST=10.2.148.57 LEN=83 TOS=0x00 PREC=0x00 TTL=64 ID=61780 PROTO=UDP SPT=161 DPT=42372 LEN=63
    [ 300.222510] Inbound IN=eth0 OUT= MAC=00:0c:29:d1:4e:30:00:00:aa:9c:e1:3b:08:00 SRC=10.2.144.125 DST=10.2.148.57 LEN=83 TOS=0x00 PREC=0x00 TTL=64 ID=4541 PROTO=UDP SPT=161 DPT=42372 LEN=63
    [ 311.040399] eth0: no IPv6 routers present
  5. which, whereis, type

    Which shows you the full path to the specific executable file, whereis do the same + shows you manuals and sources. Type shows you alias for the command

    gburanov@gburanov-ubuntu:~$ whereis ls
    ls: /bin/ls /usr/share/man/man1/ls.1.gz
    gburanov@gburanov-ubuntu:~$ which ls
    /bin/ls
    gburanov@gburanov-ubuntu:~$ type ls
    ls is aliased to `ls --color=auto'
    gburanov@gburanov-ubuntu:~$
  6. slocate

    Slocate is more generalized version of whitch and whereis. It performs a quick search over database with the list of all files. The database is updated using cron every day. Example:

    gburanov@gburanov-ubuntu:~$ slocate trueimage
    /var/log/trueimage-setup.log
    /usr/sbin/trueimagecmd
    /usr/sbin/trueimageremote
    /usr/sbin/trueimagemnt
    /usr/lib/Acronis/Agent/libag_trueimage.so
    /usr/share/man/man8/trueimagecmd.8.gz
    /usr/share/man/man8/trueimagemnt.8.gz
    /home/gburanov/Acronis/ATI Echo 2/.trueimageconsole
    gburanov@gburanov-ubuntu:~$

November 13, 2008

Code review and gained experience

During code reviews I am trying to write down (to remember) all my defects (bugs in my code). Some of them are trivial (like “Functions names must be listed in alphanumeric order” or “Use Pascal casing for member variables”) but some are not so, and I want to list them here. Of course, one must understand that during code review it is possible to find only simple defects. If you have error in code algorithm, it is not likely to be found during code review.

After 3 or four bugs in code review, you will probably gain an experience to found this defects in your code “on fly”. The things noted here are well known. If you are an experienced developer, you probably know already about the stuff. In that case, let it be just a note.

  1. Using C++ cast style

    Initially the C++ was just a wrapper over C and all casts in C (no matter if they are legal or not). C++ has not banned old style of cast, but invented four new casts const_cast, static_cast, reinterperet_cast and dynamic_cast.

    Example (incorrect):

    void*p = get_Widget_Poiner();
    int32 i32 = 123;
    int64 i64 = (int64)i32;
    Widget* w = (Widget*)p;

    Example (correct):

    void*p = get_Widget_Poiner();
    int32 i32 = 123;
    int64 i64 = static_cast<int64>(i32);
    Widget* w = reinterpret_cast<Widget*>(p);

    It is a good style never to use old C cast in new code.

  2. Using smart pointers

    Creation of objects using new/delete operators is possible but should be avoided. Different types of smart pointers must be used everywhere it is possible. In our code we use std::auto_ptr and boost::smart_ptr. auto_ptr is light weighted but it is not possible to use it in STL containers and as a member of classes, because of the problem with the ownership.

    Example:

    std::auto_ptr ptr1(new Widget);
    std::auto_ptr ptr2 = ptr1;
    //as this point of code the ONLY one who owns Widget object is ptr2. ptr1 "gave" the ownership to ptr2.

    boost::smart_ptr ptr1(new Widget);
    boost::smart_ptr ptr2 = ptr1;
    //as this point both pt1 and ptr2 owns its own Widget object.

  3. Use const everywhere it is possible

    C++ has a const keyword, although many developers just forget about it. After designing every new function interface, one must always think about cast.

    Example (incorrect):

    bool Processes::GetParentProcessId(PID& processId, PID& parentProcessId);

    Example (correct):

    bool Processes::GetParentProcessId(const PID& processId, PID& parentProcessId) const;
  4. Invalidate the function return in case of error

    This is the rule I disagree with, but still it is our project rule. If the functions notifies the upper layer about error using error code it must invalidate it’s output.

    Example (incorrect):

    bool Processes::GetParentProcessId(const PID& processId, PID& parentProcessId) const
    {

    parentProcessId = -1;
    //bla bla bla somewhere here we are crashed
    //...
    //...
    return false;
    }

    When this function returns false, the parentProcessId output is invalid. This is done for the following reason: if the guy who uses this function forgets to check the error code, he will not be able to continue working with parentProcessId (there can be a case when it is valid before GetParentProcessId execution).

    Example (correct):

    PID processId = GetProcessId(processId);
    PID parentProcessId = processId; //this line of code is stupid but possible
    GetParentProcessId(processId, parentProcessId);
    //we forget to check the return code but parentProcessId is valid. It will be difficult to
    //find this bug

  5. Single return point concept

    There is no single rule about single return error concept, but different developers have different points of view on the problem. Using single return point, it is easier to debug code (you can be sure that it is the only exit point from the function) and easier to do free-resources operations on exit. Second point can be fixed inventing smart pointers (see 2), but point 1 is still important.

    Example (correct):

    bool fun1(int a, int b, int c)
    {

    bool
    ret = false;
    if
    (a)
    {

    Do(b, c);
    ret = true;
    }

    return
    ret;
    }

    On the other point of view, when there are many validations in the code, the code that uses single return point concept becomes huge.It is better not to use the concept here

    Example (incorrect):

    bool fun1(int a, int b, int c)
    {

    bool
    ret = false;
    if
    (a)
    {

    DoSmth();
    if
    (b)
    {

    DoSmthAgain();
    if
    (c)
    {

    Do(b, c);
    ret = true;
    }
    }
    }

    return
    ret;
    }

    Example (correct):

    bool fun1(int a, int b, int c)
    {

    if
    (!a)
    {

    return
    false;
    }

    DoSmth();
    if
    (!b)
    {

    return
    false;
    }

    DoSmthAgain();
    if
    (!c)
    {

    return
    false;
    }

    Do(b, c);
    return
    true;
    }

November 11, 2008

C++ and the process of code review

One of the big differences between C++ and Java or .Net Framework for example is the way you write your code. .Net Framework or Java have it's own code style, that is a part of a language and everybody who writes code must obey this code style. For example, exceptions is a must, OOP programming is a must.

C++ is more flexible and allows programmer to write code anyway he wants. If it is a low-level API, is can use function return values instead of exceptions and if you prefer to use functional programming over OOP - it's your choice. If you want you can still use macroprogramming, there is #define.

The con of this design is that still one project limit this possibilities to write code. For example, in C++ you can return errors using return codes and using exceptions, but in our low-level project we must not use exceptions. But, before committing the code, the "eye" review by the other developer must be done. You see, what is prohibited on the level of language in Java and .Net Framework is allowed in C++ but this leads to more reviewing time.

Here is example of C++ code review in our project

void Task::execute(const TaskParames& params)
{

if
(!Validate(params))
{

return
bad_param_exception(L"Params validation not passed");
}

executeInternal(params);
}

This code will not pass code review. The correct code is:
bool Task::execute(const TaskParames& params)
{

bool
ret = false;
if
(Validate(params))
{

ret = executeInternal(params);
}

return
ret;
}

This is just one example of code style in one separate project can differs from code style in another project. So, every developer should spend some time understanding project coding rules. In .Net Framework it is also true, of course, but returning error by return value is prohibited somehow on the level of language, not on the level of project, so every developer (who knows .Net Framework) knopws about the rule.

The problem is that the example I showed you is easy and simple. But there are examples of the code where it is not so easy to say if we should allow something or deny it. For example, macroses are generally prohibited in our code (as well as in all c++), but if you are familiar with C++, there are places where it is easier (and more beautiful) to use macros instead of another solution (function, template or whatever)

November 7, 2008

Pros and cons of low level languages

Some time ago I’ve talked about DST problem with one of the Java developers and the first question he asked to me was: “Hey, bullshit, I guess you don’t even have a single class for the string”. I think the difference between C++ and Java can be put down to topic “high and low level languages”

The C++ was initially designed this way – the language is just an OO wrapper over plain C, nothing more. The initial design was to leave the language as simple as it is, and move all concepts implementations to libraries. That’s we don’t have string data type, like in .NET Framework or Java. That’s why we don’t have an operator to double the value or to get square root from value. That’s why the C++ without libraries is just nothing.

Of course we have STL – standard template library that is a part of C++ and it has std::string class (for ANSI strings) and std::wstring (for Unicode strings) and 99.9% that you need to use this class to work with strings. But, even talking about time concept, let’s look what we have in C++.






time_tISO time standardseconds since 1970-Jan-01
FILETIMEWindows time standardTicks (1 tick = 100 nanoseconds) since 1601-Jan-01
SYSTEMTIMEWindows time standardstructure with date, time year and so on
TmISO time structurestructure with date, time year and so on
UDateICU timemilliseconds since 1970-Jan-01
Moreover, we have Mac Os time format, Java time format and so on... It is common that a third library invents its own standard for time concert, and if you use the library in the development, you need to write conversion routines. At this point you need to cry out: “Oh, C++ is horrible, every guy is inventing its own bicycle”

Procs

But this is not always bad as you can think and this is one of the reasons why C++ is so popular for more than 20 years. Let’s imagine string was added as a data type to the C++ in the beginning of 80’s. That was the time when nobody was thinking about localization and globalization, and I guess that would be ANSI string. Then, with the invention of Unicode this string (that is basic data type) becomes obsolete and need to be banned. That’s shit. Most things are subject to change, and they implementation must be in a library, not in language.

Another reason why do we have so many time concepts is looking for trade-off between memory and usability. Java is a new modern language, and, when one is writing Java application, he will not use the same memory optimizations as C++ developer does, even now. I just want to notice, that Java date is 64-bit data type, and C ISO time is 32-bit data type. Sometimes, even if the initial design of the library is perfect, it can become obsolete. Java is young language and I think (I am not sure, haven’t tried it actually) it does not have many obsolete classes, but is it just the matter of time.

Also, you need to node that even a string can’t be a common concept if we are talking about lowlevel programming. Everybody needs his own tradeoffs between speed, memory and easy of usage. For example, the Windows core team does not use std::vector but it’s own DynArray class (http://sim0nsays.livejournal.com/31460.html).

Cons

The cons of this concept are, yes, writing too many bicycles. Almost every huge project has it’s own implementation of even basic concepts. One of the popular mistakes in our company (at least for the newbee) is not using the primitives, developed mostly by our developers.

For example, one can write

std::string guid1 = “0FEC22BC-64B1-4f98-AD93-5B870095C911”;
std::string guid2 = “0FEC22BC-64B1-4f98-AD93-5B870095C912”;
assert(guid1 != guid2);
But this code will not complete the process of review, because we need to use special Guid class.
Common::Guid guid1 = “0FEC22BC-64B1-4f98-AD93-5B870095C911”;
Common::Guid guid2 = “0FEC22BC-64B1-4f98-AD93-5B870095C912”;
assert (guid1 != guid2);
For example, the .Net developer will no doubt use Guid class from the .Net Framework class library.
Guid g = Guid.NewGuid();
He knows it because it is a well documented part of .Net Framework. Because it is a part of standard, the time to understand the code, written by another developer, decrease.

I need to note that C++ has boost library that is somehow not part of C++, but what is going to be part of C++ (at least it is discussed now). The boost sometimes fixes the problem with inventing bicycles.

November 1, 2008

Remote debugging for Mac Os X (part 3)

As you probably know, Mac Os X is based on FreeBSD Unix, so the methods of debugging are pretty the same as for Linux. We need to mount the sources and symbol table and then set correct way to them.

However the process of mounts differs slightly from Linux. The Mac Os X has GUI interface to do all the mounting. Switch to Finder and press Command+K to open connection dialog.

Then choose the folder to mount (folder with source codes)

It is also possible to mount folder automatically on logon. To do this, open System Preferences, click Accounts icon, choose Login Items tab and choose folder you have mounted.

Now, about Mac Os X gdb. Mac Os X gdb is hacked version of Unix gdb. Because of the hacks, it does not understand some gdb commands, including PATH command. It means, that creating .gdbinit file will not help. The only workaround for this case is to create the folder structure with the “/” root. To do this, type the following

sudo mkdir /cygdrive
sudo mkdir /cygdrive/g
ln -s /Volumes/Source /cygdrive/g/Source
Now you are ready to do all the debugging staff using gdb.


Speaking about GUI, you can use XCode, that is installed with Mac Os X (I will not go into details, the interface is somehow intuitive) or, alternatively, you can use Netbeans (if you are used to Linux)