Category: Tutorials

Void pointers in C

Void pointers are pointers pointing to some data of no specific type.

A void pointer is defined like a pointer of any other type, except that void* is used for the type:

void *pt;

You can’t directly dereference a void pointer; you must cast it to a pointer with a specific type first, for instance, to a pointer of type int*:


Thus to assign a value to a void pointer, you will have to do something like:

*(float*)pt=3.14; /* You can assign a value of any type to the pointer */

The use of void pointers is mainly allowing for generic types. You can create data structures that can hold generic values, or you can have functions that take arguments of no specific type. If you wanted a linked list allowing for generic values, you would define your list node like this:

struct ListNode{
  struct ListNode *next;
  void *data;

A generic function for doubling a value might look like the following:

#define TYPE_INT 0
#define TYPE_FLOAT 1
void doubleVal(int type, void *var){
  } else if(type==TYPE_FLOAT){

Called, rather obviously, like the following:

doubleVal(TYPE_INT, &integer);
doubleVal(TYPE_FLOAT, &floatingPoint);

You can’t perform pointer arithmetic on void pointers since the compiler doesn’t know the size of the data which is pointed to, however, you may cast a void pointer to a pointer of some other type and perform pointer arithmetic on that.

The most useful GCC options and extensions

This post contains information about some of the most useful GCC options. It is meant for people new to GCC, but you should already know how to compile, link and the other basic things using GCC; this is no introduction. If you want an introductory tutorial to GCC, just google for that.

The things this article attempts to cover include (as well as a few other things):

  • Optimization
  • Compiler warning options
  • GCC-specific extensions to C
  • Standards compliance
  • Options for debugging
  • Runtime checks (e.g. stack overflow protection)

For more information on GCC, the freely available An Introduction to GCC book is pretty good. A manual with over 700 pages is available as well (it’s a reference, not a tutorial, though) from the GCC website. The manpages for GCC (man gcc) can also be useful.

Basic compiler warning and error options

An option many programmers always use while compiling C programs is the -Wall option. It enables several compiler warnings not enabled by default, such as -Wformat warning at incorrect format strings. To enable even more warnings, use the -Wextra option. All warnings can be turned off with -w. More warnings will make catching eventual bugs easier, but it may also raise the amount of false-positives. The exact implications of these different options, as well as individual warnings can be found here.

To treat compiler warnings as errors, use the -Werror option. To stop compilation when the first error occurs, use -Wfatal-errors.

Standards compliance

By default, GCC may compile C code that is not necessary standards-compliant, or it might not even compile code that complies to the C standard (“the C standard” here means either C89 or C99). Some C standard features are disabled by default, such as trigraphs (can be enabled with -trigraphs), and several GCC extensions (will be talked about later on in this article) will work, even if they aren’t parts of the official C standard.

The -ansi option can be used to make GCC correctly compile any valid C89 program (if not, it is due to a compiler bug). It will still accept some GCC extensions (those that aren’t incompatible with the standard); use the -pedantic option to make GCC a pedant when it comes to standards compliance. The -std= option can be used to set the specific standard. There’s a bunch of supported standards (and most standards have several valid names), but the important ones to know are c89 (equal to the earlier -ansi option), c99, gnu89 (C89 with GCC extensions, which is the default) and gnu99 (C99 with GCC extensions). You can also use c1x to enable experimental support for the upcoming C1X standard (or gnu1x for the same with GCC extensions).

Code optimization levels

You can set the code optimization level for GCC, which decides how aggressively GCC will optimize the code. By default, GCC will try to compile fast, thus no optimizations will be made. By setting an optimization level, GCC will spend more time compiling, and the code might be harder to debug as well, but optimize the code better, possibly resulting in a faster executing program and/or smaller binary filesize. Because of longer compile-time and possible complications making debugging harder, it can be a good idea not to optimize during the development process and wait with that for building the production binary.

The default optimization(less) level is set with the -O0 option, or by giving no optimization option at all.

Some of the most common optimization forms can be activated by using the -O1 (or simply just -O) option. This option tries to produce smaller and faster binaries, and in many cases it can compile faster than -O0 because some optimizations will simplify the program for the compiler as well.

The next level is, perhaps unsurprisingly, -O2. It tries to improve the speed of programs even more than -O1 does, without increasing the size. It can take a much more considerable amount of time to compile. This is recommended for production releases as it optimizes well for speed without sacrificing space.

The -O3 level does some of the heaviest, most time consuming optimizations. It may also increase the size of the binary. In some cases, the optimizations may backfire and actually produce a slower binary.

If you want a small binary (most of all), you should use the -Os option.

Just pre-process, compile or assemble

When you ask GCC to compile a C program, the following steps are usually taken:

  1. Pre-processing
  2. Compilation
  3. Assembling
  4. Linking

For different reasons, you may want to stop at some of the steps. You might just want to pre-process, for instance, to find an error you suspect comes from a faulty pre-processor directive. If you do so, you will see the output from the pre-processor instead of getting the complete finished binary. Likewise, you may wish not to link because you are going to link manually later on, or maybe you just want to get the assembly output, modify that in some way and then manually assemble and link it. The reasons why you would want to do that isn’t the important thing here though, but how to do it.

To only pre-process, you should use the -E option. To stop after compilation, use -S. To do all steps but linking, use -c.

Controlling assembly output

Normally GCC produces AT&T syntax assembly output, but if you want to use Intel’s syntax (which is, in my opinion, much more readable), you should set the assembly dialect with the -masm= option, with intel as the value (-masm=intel). Note that this won’t work on Mac OS X.

A useful option for making the Assembly code more readable is the -fverbose-asm, which adds comments to the assembly output.

Adding debug information

If you are going to debug your program later, and don’t want to debug the assembly version, the -g option is absolutely essential. It adds debugging information, so that you can do source-level debugging later on the binary. The -g option produces debug information specifically for GDB, so what you get will not necessarily work on other debuggers.

You can set the level of debug information to generate. The default level is 2. With -g1 you can inform GCC to produce minimal debugging information, and with -g3 you can tell GCC that you want even more debug information than what you get by default.

Adding runtime checks

GCC can add different runtime checks to C programs at compilation, making debugging easier and avoiding some of the most common security vulnerabilities in C programs (as long as vulnerabilities/bugs don’t exist in the checking…). Note that runtime checks can degrade performance of programs.

There is an incredibly useful GCC option, -fmudflap, which can be used to find pointer arithmetic errors during runtime. This can help you find many pointer arithmetic related errors.

Stack overflow protection can be enabled by using -fstack-check.

The -gnato option enables checking for overflows during integer operations.

GCC extensions to the C language

GCC provides several extensions to the C programming language that aren’t actually parts of the C standard. You should always be careful while using non-standard features as that would, in most cases, make your code incompatible with other compilers. Anyway, I will cover some of the most useful extensions GCC provides, and you decide if you use them or not.

All extensions can be found in the GCC documentation.

Likely and unlikely cases

One GCC extension frequently used in the Linux kernel is the GCC extension __builtin_expect option, commonly known as the likely() and unlikely() macros. The Linux kernel would use something like the following for telling GCC which if statements are likely and unlikely to execute, so that GCC can do better branch prediction:

/*This is the likely case which will occur most of the time*/
  return 1;
/*This is the unlikely case which will occur much more
 *seldom than the earlier case*/
  return 0;

The likely() and unlikely() macros are defined in the Linux kernel as:

#define likely(x)       __builtin_expect((x),1)
#define unlikely(x)     __builtin_expect((x),0)

If you want to use this outside the Linux kernel, you could always type __builtin_expect(condition, 1) for likely cases and __builtin_expect(condition, 0) for unlikely cases, but it would be much easier to use the same macros as the Linux kernel uses.

Additional datatypes

GCC provides some additional datatypes to the C programming langauge not defined by the standard. These are:

Note that 80- and 128-bit floating point values are not supported on all architectures (they are supported on common x86 and x86_64 systems, though). On ARM platforms, half-precision (16-bit) floating points are supported.

Ranges in switch/case

A GCC extension provides the support for ranges in switch/case statements, so you can have a case for values between 10 and 1000, for instance. A range is defined as x ... y, where x is the lower-bound and y is the upper-bound. You may not leave the spaces before and after the dots out. An example switch statement using cases with ranges:

  case 0 ... 9:
    puts("One digit"); break;
  case 10 ... 99:
    puts("Two digits"); break;
  case 100 ... 999:
    puts("Three digits"); break;
    puts("I sense a disturbance in the force");

This is more convenient than writing 1000 different cases (but you wouldn’t solve the problem like that, would you?).

Binary literals

GCC supports binary literals in C programs using the 0b prefix, pretty much like you would use 0x for hexadecimal literals. In the following example, we initialize an integer using a binary literal for its value:

int integer=0b10111001;

The J Programming Language: An introduction and tutorial

I have been reading about the J programming language lately. It is quite different from many other languages, but I will try to cover the absolute basics of it, while, inevitably, much is left out. In the end, this article tries to give you a feel of the language and get you started, other resources are available for achieving higher proficiency in the language.

The language has very good resources available, so kudos for that. Freely available books include Learning J and J for C Programmers.

J is an offspring to the APL programming language. One of the most important changes from APL to J is that you can now use a standard keyboard (without any extra character map or similar trick) to type the code, as J, unlike APL, uses only pure ASCII characters.

J is a member of the array programming paradigm. Other notable paradigms J is a member of are the functional, function-level and tacit programming paradigms.

Just recently, in March 2011, J was released as open source software under the GNU GPL v3.

J is suitable for mathematical and statistical computation. It also happens to be good for code golf, as a bonus, because the language is very terse. J provides standard libraries, an integrated development environment, bindings to other programming languages and built-in support for 2D as well as 3D graphics. Plotting features exist, and I believe J could be used as a Maxima or Mathematica replacement in some cases.

J can be run interactively, which greatly helps while learning and/or trying out new techniques. It is recommended that you download and install J on your system and try out the material in this article in order to achieve a better understanding of the language.

Basic symbols

The + symbol is used to add two numbers together, one on the right and one on the left, e.g. 8 + 4. Spaces are optional (in this case), and can be used to increase clarity. Subtraction uses the - symbol and multiplication *. Division uses %, not /. C and most other common programming languages use / for division and % for modulus, but J doesn’t (| is used as modulus in J).

You can chain symbols for more complex expressions, e.g. 2 * 3 + 4. You might think that this would produce the result 10, as multiplication usually has higher precedence than addition, but J doesn’t have any precedence rules, and evaluates everything from the right to the left (this is also unlike mathematics, where evaluation happens from the left to the right when operators have the same precedence), thus, 2 * 3 + 4 evaluates to 14. Symbols lack precedence rules in J because J has hundreds of them, and it wouldn’t be fair to have programmers remember the precedence rules for all of them. You can, however, use parenthesis to change precedence just like in mathematics. To make 2 * 3 + 4 evaluate to 10, type (2 * 3) + 4.

Negative numbers are prefixed with an underscore (_), not a hyphen as in most other languages. 9 -4 will evaluate 9 subtracted by 4, while 9 _4 is a list containing the elements 9 and negative 4 (lists will be explained later).


Comments in J are started with NB. and continue towards the end of the line. An example is NB. This is a comment (don’t miss the dot in NB.).

Monads and dyads

Monads are functions with values on just the right side, while dyads are functions with one or more values on both sides of the function. Symbols often have separated monadic and dyadic cases, where the symbol will act in different ways if the symbol is used as a monad or as a dyad. As an example, - used as a dyad (x - y) is the familiar subtraction symbol, but used as a monad it is the negation symbol (e.g. - 9 _4) negates the two values, returning _9 4.


Lists are one-dimensional arrays. A list can be constructed by placing values separated by spaces after each other, as in 2 9 4. 2 9 4 is a list containing 2, 9 and 4. We can add something to each value of the list by typing 2 9 4 + 42; here we add 42 to each item, producing a new list 44 51 46. You can also type something like 1 2 3 + 4 5 6, which will produce 5 7 9 (1+4, 2+5, 3+6).

To get the length of a list, use the # symbol (as a monad), followed by a list.

Verbs and adverbs

J categorises many parts of the language into classes that are much like the classes of different natural languages. Among these are verbs and adverbs, which relate to functions. A verb is pretty much a normal function, such as +. An adverb is a special function that takes another function from the left and changes the way that function gets applied. An example of such a function is /.

/ takes a verb and then applies that to all the elements of a list. For instance, + / can be used to sum the values of list elements as + / 8 4 2 4. Likewise, you can multiply all elements together with * / and so on.

Another example is ~, which can be used to swap left-side and right-side arguments. For instance, 7 %~ 9 divides 9 with 7, not 7 with 9, as the sides the arguments resided on were swapped.


J uses =: for assignment (and =, not ==, for equality comparison). We can, for instance, assign the value 60 to the identifier x, by typing x=:60. Note that the equals sign comes first, which differs from the standard practice of putting the colon first; something that had me confused for a while.

Tables and multidimensional arrays

Using the dyadic $ symbol, you can construct multidimensional arrays. Arrays that have exactly two dimensions are called tables.

We can construct a table using 3 3 $ 1 2 3 4. The numbers on the left side specify the length of the different dimensions. Each number in that array represents the length of a specific dimension, and the amount of numbers sets how many dimensions will be used in the array. With 3 3, we create a 3 by 3 table. The numbers on the right specify the values which will be used as elements in the array. A 3 by 3 array would contain 9 elements, but we only specified 4 in the right-hand array. In that case, the array values we have will be repeated, forming the following table:

1 2 3
4 1 2
3 4 1

If we want to loop some values in a list, we can also use $ for that, e.g. 10 $ 1 2 3, producing 1 2 3 1 2 3 1 2 3 1.


J is definitely the coolest programming language I have touched recently (and among the languages I have recently played with include hyped languages such as Go, Haskell, Scala, D and Clojure, many which impressed me, but not as much as J). I look forward to learning more about J, and I hope you will do so as well.

I have not yet used it for anything non-trivial, and I will see how well it will do in such tasks, but so far, I’m pretty damn impressed.

The Vim options you should know of

Vim is a highly configurable editor, and if you can’t configure something as you want, plugins are available. While this post won’t cover plugins, it will cover the most useful options you can set in Vim. This is meant for beginner users of Vim; advanced users probably know all of the following already.

To make options permanent, you can modify your ~/.vimrc file. Type vim ~/.vimrc, type in the commands (one per line) and save.

Line numbers

Showing line numbers in a paragraph to the left can prove useful when programming. To do so, use :set number. If you want to disable that, just type :set nonumber.

Highlight search results

To enable highlighting of search results, type :set hlsearch, and to disable, type :set nohlsearch.
:set hlsearch

Auto- and smartindent

Autoindent will put you at the same indention as the previous line when opening a new line below. To enable autoindent, use :set autoindent, and to disable it, use :set noautoindent.

Smartindent is more advanced, and will try to guess the right indention level in C-like languages (only). To enable smartindent, type :set smartindent, and to disable it, type :set nosmartindent.

Change colorscheme

Use the :colorscheme {name} command to set the active colorscheme for Vim to {name}.

Adjust colors for dark or bright backgrounds

If you have trouble reading text as it is too dark on your dark terminal background, you might want to use the :set background=dark option.

Result of :set background=dark:
:set background=dark
Result of :set background=light, and also Vim’s default:
:set background=light

To set colors for a bright background, use :set background=light.

Enable incremental search

To enable incremental search, type :set incsearch. With incremental search enabled, Vim will find results as you type.
Incremental search
Incremental search can be disabled with :set noincsearch.

Ignore case in searches

If you want to ignore case in searches (e.g. be able to search for vim and find Vim, VIM and other variations), you can use :set ignorecase. To start treating letters with different case as different letters again, type :set noignorecase.

Enable spell checker

To enable Vim’s spell checker, type :set spell. For help on the commands to use with the spell checker, type :help spell. You can disable the spell checker with :set nospell.

Vim's spell checker

The spell checker doesn’t check spelling in code, just comments and non-code files.

Bytes and bitwise operators in C

Bitwise operations have many uses. I asked a question a few months ago at, where I was taught that. The answer which I accepted contained the following list of uses (credit goes to user whatsisname):

* Juggling blocks of bytes around that don’t fit in the programming languages data types
* Switching encoding back and forth from big to little endian.
* Packing 4 6bit pieces of data into 3 bytes for some serial or usb connection
* Many image formats have differing amounts of bits assigned to each color channel.
* Anything involving IO pins in embedded applications
* Data compression, which often does not have data fit nice 8-bit boundaries.\
* Hashing algorithms, CRC or other data integrity checks.
* Encryption
* Psuedorandom number generation
* Raid 5 uses bitwise XOR between volumes to compute parity.
* Tons more

I could myself further add the following:

  • You can improve performance
  • You can decrease system memory usage

Since I asked the question, I have played around with them, and additionally, I have started programming at a much lower level than before. I will now present an introduction to bitwise operators in C and a refresher about binary numbers. In the end, I will show and explain an implementation of a boolean datatype, with 8 booleans per byte, that is, 100% of the bits will get used. Programmers who know about the bitwise operators can skip ahead to that.

Bits and bytes

As you (should) know, everything in most modern computers is stored as sequences of bits, often represented as 0′s and 1′s (though really it is about differences in electrical charges, but 1′s and 0′s are more handy representations). A byte consists of 8 bits, for instance 1001 1101. The first bit, read from the right, represents a 1, the second a 2, the third a 4, the fourth an 8 etc. In decimal, 1001 1101 would be (1*128)+(0*64)+(0*32)+(1*16)+(1*8)+(1*4)+(0*2)+(1*1), which is 128+16+8+4+1, or 157. Typing something like 1001 1101 can get a bit tedious after some time. Programmers don’t use base 10 to simplify this though (10 is not a power of 2), but base 16 (hexadecimal), which, as a power of 2, has a pretty neat relation between base 2 (binary).

You can represent every combination one byte might have using 2 numbers in hexadecimal. There are 16 numbers in hexadecimal (including 0), and each of the two groups which we divided the byte into earlier can be represented by one number. The first number, coming from 1001 would be 9 and the second, coming from 1101 would be D (13, hexadecimal uses the letters A-F when using numbers higher than 9). Let’s check this, 0x9D (programmers often prefix numbers represented in base 16 with 0x) would be (9*16)+(13*1)=144+13=157. Yes, that is correct.

Now, how do you represent things other than numbers? Well, RGB color values, for instance, are represented using 3 bytes, one for red, one for green and one for blue. Each has 256 different
combinations (a value of 0 means none of that color and 255 means all of that color), and thus RGB can be used to represent 256^3=16 777 216 combinations, roughly 16,8 millions. An example, some sort of grey, would be 0x888888.

Another example is characters. Characters are most of the time represented in ASCII or Unicode (Unicode is basically a superset of ASCII containing characters for many international alphabets etc.). ASCII has 128 characters (uses 7-bits). Different characters have different values, for instance, an upper-case A is 65 (0x41). In C, strings are null-terminated; strings (should) end with a null-character (ASCII character 0). The string ABC would have a 0x41 byte, followed by a 0x42 byte, 0x43 byte, ended with a 0x00 byte (the null character). In memory, you could have a chunk of 4 bytes representing that string, like 01000001 01000010 01000011 00000000. Note that there is nothing that says that this is four characters that should be used as a string, you could use it like one 32-bit integer, or something else. Memory has no meaning at the byte level.

Big- and little endians

There are two orders in which you store bytes in memory that are common in modern machines, these called big- and little endian. In big endian, the most significiant byte comes first, and in little endian, the least significant number comes first. Big endian is what we humans use most of the time, for instance, in 113, one hundred is the most significant number, and it comes first. One hundred thirteen would have been 311 if it was written in little endian form. Intel (and compatible) processors use little endian. Note that bits in bytes are stored the same way in both formats (the most significiant bit is first), only the order in which whole bytes come in collections of bytes is altered by this. A 32-bit integer with the value 303153 would be stored as 00 04 A0 31 in big endian form and as 31 A0 04 00 in little endian form.

C deals with endianness for you and you won’t have to worry about it. Only Assembly programmers do have to worry about this in general (and IT security folks and reverse engineers and some more).

Integer literals in C

C provides several options which you can use when writing down integer literals. We will only concern us with the ability to use different bases. If you type out an integer “like you normally would”, C will interpret it as a decimal value (e.g. 73). If it has a leading 0, C will think that it is a base 8 number (e.g. 0777). We are not concerned with base 8 numbers here, but some older computers used them. Hexadecimal literals start with 0x, followed by a value in hex, with no spaces in between (e.g. 0xF7).

Sadly, C has no way to define binary literals.

The bitwise operators

A bitwise operation is an operation that works on the individual bits of a byte (or a composition of bytes). We will now look at the bitwise operators which are available to C programmers.


C has two bitshift operators, left- and right bitshift, or << and >>, respectively.

The action of both of these is common. They shift n bits, either to the left or to the right (I will let you guess yourself which one shifts to which direction). Bitshift left by one step is the same as integer multiplication by 2, and bitshift one step to the right is the same as integer division by 2. The following (bitshift 3 steps to the left) is equal to integer multiplication by 8:

0x2A << 3 /*0x2A=0000000 00101010, (0x2A << 3)=00000001 01010000*/

Note that you set how many steps you want to do your bitshift on the right side of the operator. You must always set the amount of steps.

We can create a simple function for calculating the n:th power of 2 using this:

int powerOf2(int n){
  return (1 << n);

Bitwise AND

Bitwise AND takes two numbers and returns a new one, where 1's are located at and only at places where both values had 1's:

10011011 AND        11001100 AND
10110101            00011011
--------            --------
10010001            00001000

The operator is an ampersand (&). So we can for instance run the following:

19 & 18

We can break this doẃn to:

19 & 18=   00010011 &
           00010010   (=18)

So we can use this to find common 1's for two bytes, but what is that good for? We can for instance check if a number is odd by doing:


The only case when you get a non-zero number is when the 1 is "on", and that is the only case a number is odd. The following is this as a C function (it could also have been implemented using the modulo operator, as it is normally done):

int isOdd(int x){
  return(1 & x);

Bitwise OR

Bitwise OR, compared to bitwise AND, is happy if just one of the numbers has a 1 at a location (both may have it though). Some examples:

10011011 OR         11001100 OR
10110101            00011011
--------            --------
10111111            11011111

The operator for bitwise OR is a pipe (|). We can run something like the following in C:

35 | 92

Broken down:

35 | 92=   00100011 |
           01111111   (=127)

Bitwise XOR

The bitwise XOR (exclusive or) is like bitwise OR, but only one of the numbers can have a 1 at a given position in order to return a 1 at that position. Examples:

10011011 XOR        11001100 XOR
10110101            00011011
--------            --------
00101110            11010111

A hat (^) is the bitwise XOR operator in C.

Bitwise NOT

Bitwise NOT basically flips the value of each position (0 becomes 1 and vice verse). Examples:

11010111 NOT = 00101000
01011001 NOT = 10100110

It is the only unary bitwise operator in C and it is a tilde (~), which you should place in front of the value to "invert".

Booleans that use memory efficiently

In C, it is customary to use integers to store boolean values (no "real" boolean datatype exists), where 0 is false and everything else is true. So, as an example, 00000000 00000000 00000000 00000000 is false, and e.g. 00000000 00000000 00000000 00000001 is true. What do we do with the other 31 bits or the other 96.875% of the 4 bytes? Umm... nothing, actually. Using a char would be a bit better (24 bits better, actually), wasting "only" 7 out of 8 bits (87.5%). That is still not very good, one char has enough room for 8, not 1, boolean values. While this doesn't matter that much on modern home computers, on some embedded computers this is the difference between success and failure; a considerable difference. What we are going to do is to have a char variable, and then we will add functions allowing us to easily use that single variable as if it was 8 different boolean values.

We will use a typedef to create our boolean variable (which is just a char in disguise):

typedef char bool;

You can change this to something else than a char to allow for more values than 8.

We will also define macros for TRUE and FALSE:

#define TRUE 1
#define FALSE 0

Our boolean will act like an array, to some extent. A simple function to get one individual value would be the following, getBool, which takes 2 arguments; the variable and an "index":

bool getBool(bool boolean, int index){
  return ((boolean >> index) & 1);

What this does is basically to shift the value we are interested to the right, and then it uses AND to clear all values but the last, and if any value is left, we will have a 1, which is true, and else, a 0, false.

We will also have a function called setBool, with 3 arguments, a pointer to the boolean, the "index" of the boolean we want to access and the value we want to set at that position:

void setBool(bool *boolean, int index, bool value){
    *boolean=*boolean & ~(1<<index);
    *boolean=*boolean | (1<<index);

The following graphic explains line 3:

*boolean=*boolean & ~(1<<index);
Assume boolean is 01001000
Assume index is 3

00000001 << 3  = 00001000
00001000 NOT   = 11110111

01001000 &

The following graphic explains line 5:

*boolean=*boolean | (1<<index);
Assume boolean is 10001100
Assume index is 1

00000001 << 1  = 00000010

10001100 |

I apologise if my solution is suboptimal; I'm new to low-level programming. And that was it.

Pythonic Coding: Techniques for idiomatic Python code

It is possible to write Java, C etc. in Python; that is, you use Python as your programming language, but you write code as if it wasn’t Python but another programming language and all you do is adjust the syntax to avoid (syntax) errors and get the program to run. Preferably, your Python code should be like Python code, not just be written in Python. Code that is written like that is called pythonic code; good Python code can be called pythonic. In this post, we will look at various techniques and idioms of the Python programming language that are typical in pythonic code.

Many techniques unique to Python, or at least not very common in other languages, are presented here. They are however not explained in very great detail and in such cases, links will be provided to more detailed documentation.

Use ‘in’ properly

A highly unpythonic way to iterate and print values in a list would be:

l=[8, 32, -53, 4]
for i in range(len(l)):
  print l[i]

This might be acceptable in C or similar languages because of the lack of any better way to do it, but it is bad in Python, since a superior way is provided. The pythonic and encouraged way would be:

l=[8, 32, -53, 4]
for i in l:
  print i

This is much simpler and clearer (these two are usually related). The important thing on the second line in the pythonic code example is that the in keyword is used correctly. When the in keyword is used with the for statement, it will go through every one item of an iterable object (one per iteration of the for loop), and i (or whatever variable you use) will will contain the value for the current object in the iterable. Because of this, we can also change the third line from print l[i] to print i.

It is good to know that the range() builtin returns an iterable object with values from something (default is 0) to the integer given as an argument. In the unpythonic code, we basically iterate through 0, 1, 2 and 3 and use those numbers as indexes for different items in our list. In the pythonic code, we “directly” iterate through the list.

Avoid parenthesis around conditional statements and loops

The following is correct Python code:


It is possible to leave the parenthesis out, and you should do so. The parenthesis are considered a code bloat, and the pythonic way to write that if statement would be:

if True==False:

While C, Java and many other languages do require the parenthesis, Python doesn’t, so don’t use them (the reason the former is allowed is because the expression will just be treated as a parenthesized expression).

Checking for one out of multiple conditions

One way to check if, say, the variable x is equal to 8, 0 or 6 using just one single if statement is to write:

if x==8 or x==0 or x==6:

That code is, among other things, ugly and harder to maintain than how it should be. The in keyword will help you here. Using a list with in, you could have the following code, meaning exactly the same as the above, but with several advantages (pythonic and easier to read and maintain):

if x in [8, 0, 6]:

It iterates through every one of the list items, and in case one (or more) of them match, the statement evaluates to True. The pythonic way here can only replace or, not and or any other operators bearing similar roles.

Use properties

Don’t write this kind of class in Python:

class Clock(object):
  def __init__(self):
  def setHour(self, hour):
    if 25>hour>0:
      raise BadHourException
  def getHour(self):
    return self.__hour

These traditional getters and setters should be replaced with properties in Python. To replace hour with a property, we can use the property() builtin. property() takes up to 4 arguments, the first being a function which is used as a getter, the second a function which is used as a setter, the third a deletion function and the fourth a docstring to be used for the property.

To upgrade our last example to use properties, we can use the following:

class Clock(object):
  def __init__(self):
  def __setHour(self, hour):
    if 25>hour>0:
      raise BadHourException
  def __getHour(self):
    return self.__hour
  hour=property(__getHour, __setHour)

Note how the old getters and setters are now hidden. Before, we would have used something like the following to manage the hour variable:

print time.getHour()

This is what Java code could (and even should) look like, but Python isn’t Java and Java isn’t very pythonic. Our later implementation, which uses properties, would work like the following:

print time.hour

When we assign to time.hour at the second line, Python actually calls our __setHour() method, and the value we are giving it becomes the first (second if you include self) argument of that method. Likewise, when we ask for the value at line 3, we are given the return value of the __getHour() method, not necessarily the actual value of __hour.

Note that properties work only on new-style classes (classes that extend object).

List comprehensions

List comprehensions is one of my favorite Python features. If you were tasked to write a program that returns all odd values from a list, you might type something like the following if you were unaware of this feature:

l=[4, 2, 1, 9, 14, 25]
for i in l:
  if i%2==1:

Using list comprehensions, you could type this on one line if we exclude the definition of the list:

new=[i for i in l if i%2==1]

The syntax might seem scary at first, but it isn’t, actually. Let’s break it down a bit. If you type the following in a Python interpreter, you will get the whole list:

[i for i in l]

This iterates through l, and for each iteration, it adds the value of i to a new list, and finally, when the iteration has finished, the new list is returned. We can add a statement which specifies a condition that each item must satisfy in order to be included in the new list. This is the if i%2==1 part.

That was the most basic form of a list comprehension. List comprehensions are extremely powerful, and you can do much more complex stuff than this. See the following for examples of complex list comprehensions.

Use enumerate()

You have a list of 5 names:

names=["Peter", "Jack", "Lucy", "Lucas", "Linus"]

You want to print them as a numbered list like the following:

1. Peter
2. Jack
3. Lucy
4. Lucas
5. Linus

The problem is that you need to know the index of the name you are printing in the list, but by using for i in names, you won’t know that, and using range() for this purpose is not just pythonic. Don’t fear, enumerate() solves this problem in a pythonic fashion:

for i, name in enumerate(names, 1):
  print "{0}. {1}".format(i, name)

The first argument which we give to enumerate is an iterable. The second argument is optional (it defaults to 0), and it sets the value at which to start the enumeration (we want to start from 1, thus we used that).


Generators in Python is an advanced and powerful feature that, when correctly used, can ease the coding of many tasks. Let’s implement a custom range function (it will have less features than Python’s builtin, note that while is used instead of a for i in range(n) as we want to implement our own range-like function without using the builtin):

def myRange(length):
  while n<length:
  return result

The 3rd line is highlighted; we don’t want that line; we don’t want a temporary list. This is where generators are useful. Instead, we could have used the following code for our range implementation:

def myRange(length):
  while n<length:
    yield n

The nice trick here is the yield keyword; it adds a value to a list, which is later on returned as a generator object (note the lack of an explicit return statement, the generator is returned when the function has finished its execution). Generator objects can, for instance, become iterators, now allowing us to use our own range implementation instead of the one built into Python:

for i in myRange(7):
  print i

Some final things

To swap two values, don’t use a temporary variable, instead use:

a, b=b, a

You don’t have to use the readlines() method of files to iterate through files, instead you can directly iterate with a for loop:

for line in file:
  #do something with line

If possible, you should call the open() function at the same line as the for statement:

for line in open("file.txt"):
  #do something with line

Use tuple unpacking to return multiple values from a function:

def doubleUs(a, b)
  return a*2, b*2
x, y=doubleUs(4, 9)

The idiom for an infinite loop in Python is a while loop with the expression 1 (not True):

while 1:
  #Do things


Python is a language with many great features. These features are the things that make Python Python and not “just another programming language”. These features and idioms are pythonic and define the Python way of doing things. This was just an introduction to these, and there is much more to the word pythonic than these features.