Nikolai Obedin


  1. Home automation with Raspberry Pi and Rust

    Feb 19, 2017 in Computers

    Recently I came up with an interesting idea that finally puts to work my old dusty Raspberry Pi: home automation. More specifically, I’d like to control lights and measure current temperature and humidity in my house. Of course, lots of people did it before many times, and there are plenty of resources, tools and libraries that make it extremely easy to achieve this goal; however, they take away from you all the fun of figuring out how things actually work. Another issue is that most of existing tools are written in C, which is unsafe and cryptic. Taking these reasons into account, I decided to write all the code that controls devices from scratch in Rust, and that’s what this article is about.

    Preparation and building

    Before writing any code, I had to plan what exactly I wanted to control and how. In case of temperature and humidity sensors, there were many choices, but since I didn’t have any specific requirements, I just chose the cheapest and simplest model that was there: DHT-22. It cost around 7 EUR on Amazon and had a simple serial interface.

    In case of lights, things were more complex. Since I rented a flat, I couldn’t change how electricity flows into bulbs, therefore I had to decline an idea of controlling ceiling lights. Instead I decided to control only a floor lamp near the couch and Christmas lights that were still hanging above the table. At first I wanted to alter the lamp and lights themselves, but since they were using plain AC power sockets, I decided to go with a “smart socket” solution where the lamp and lights plug into sockets and RPi controls the sockets.

    Now, what exactly does it mean “to control power sockets with Raspberry Pi”? RPi can control things through its General Purpose I/O interface aka GPIO; that is, basically, a set of pins on the board which can either input or output 0 or 3.3VDC. The problem is that it can’t be the source of electricity for power sockets because they need 220VAC. Luckily, there are special electric switches, called “relays”, which allow you to control the flow of AC using DC signal. The simplest relay is just a coil which generates magnetic field that controls a switch. I went for this one, though it is possible to spend a bit more money and buy a solid-state relay which does not produce clicking sound when it turns off and on.

    Another things which are really handy when working with RPi’s GPIO are breadboard, T-cobbler and multimeter. Breadboard makes it very easy to build circuits; T-cobbler maps GPIO ports into breadboard’s sockets and labels them; multimeter is used to debug GPIO because it measures voltage, current and resistance.

    My final shopping list was:

    1. DHT-22 temperature and humidity sensor (7 EUR)
    2. 4-channel 5V relay module (10 EUR)
    3. Breadboard (7 EUR)
    4. GPIO T-Cobbler for Raspberry Pi (12 EUR)
    5. Set of jumper wires (7 EUR)
    6. Set of resistors (9 EUR)
    7. Couple of power sockets (6 EUR)
    8. Light switch (2 EUR)
    9. Copper wires for AC installation, screws, plastic connectors, etc. (10 EUR)

    70 EUR in total. Pretty much the cost of two usual “smart” power sockets.

    Below is a connection schema for the system:

    Thick lines stand for AC current cables, thin ones are for DC current, just like in real life. As you can see, DHT-22 sensor needs a pull-up resistor to keep line high when no communication is in place. AC relay doesn’t need it since it’s mechanical inside and slight noise in a line won’t hurt or falsely trigger it.

    Controlling GPIO

    Now, when everything is ready, it’s time to figure out how actually RPi can control things. As I mentioned before, it has a set of special IO pins, called GPIO. Usually they are accessible through a set of registers in internal memory of MCU: you write to a specific register to enable selected pin, choose its operational mode, enable interrupts, etc. Raspbian Linux exposes internal MCU memory via virtual device /dev/mem; it’s more convenient, however, to use /dev/gpiomem for GPIO control because it doesn’t require root privileges. Memory layout for GPIO can be found in the datasheet of your processor; in case of my Raspberry Pi 2 Model B, it’s BCM2835.

    Implementation of GPIO driver is straightforward: first, it opens /dev/gpiomem and maps it into RAM using mmap; then gets access to it whenever it is required to change operation mode of a pin or communicate through it:

    extern crate libc;
    
    use self::libc::{c_void, c_int, c_char, open, close, mmap, munmap,
        O_RDWR, O_SYNC, PROT_READ, PROT_WRITE, MAP_SHARED, MAP_FAILED};
    use std::ffi::CString;
    use std::io;
    use std::ptr;
    use std::result;
    
    
    pub struct GPIO {
        mem: *mut u32,
    }
    
    #[derive(Eq, PartialEq, Copy, Clone)]
    pub enum Mode {
        Input,
        Output,
    }
    
    #[derive(Debug)]
    pub enum Error {
        IOError(io::Error),
        CantOpenGPIOMem,
        CantMapGPIOMem,
    }
    
    type Result<T> = result::Result<T, Error>;
    
    impl GPIO {
        pub fn new() -> Result<GPIO> {
            let gpio_fd = open_gpio_mem()?;
            let mem = map_gpio_mem(gpio_fd)?;
            Ok(GPIO { mem: mem as *mut u32})
        }
    
        pub fn set_mode(&mut self, pin: u8, mode: Mode) {
            check_pin_num(pin);
            let offset = pin as isize / 10;
            let shift = 3 * (pin as isize % 10);
            unsafe {
                *self.mem.offset(offset) &= !(7 << shift);
                if mode == Mode::Output {
                    *self.mem.offset(offset) |= 1 << shift;
                }
            }
        }
    
        pub fn set_high(&mut self, pin: u8) {
            check_pin_num(pin);
            unsafe { *self.mem.offset(GPIO_MEM_OUTPUT_SET_OFFSET / 4) = 1 << pin; }
        }
    
        pub fn set_low(&mut self, pin: u8) {
            check_pin_num(pin);
            unsafe { *self.mem.offset(GPIO_MEM_OUTPUT_CLEAR_OFFSET / 4) = 1 << pin; }
        }
    
        pub fn read_input(&mut self, pin: u8) -> bool {
            check_pin_num(pin);
            let byte = unsafe { *self.mem.offset(GPIO_MEM_INPUT_LEVEL_OFFSET / 4) };
            byte & (1 << pin) > 0
        }
    }
    
    impl Drop for GPIO {
        fn drop(&mut self) {
            unsafe { munmap(self.mem as *mut c_void, GPIO_LENGTH); }
        }
    }
    
    impl From<io::Error> for Error {
        fn from(err: io::Error) -> Error {
            Error::IOError(err)
        }
    }
    
    const MAX_PIN: u8 = 32;
    const GPIO_MEM_FILENAME: &'static str = "/dev/gpiomem";
    const GPIO_LENGTH: usize = 4096;
    const GPIO_MEM_OUTPUT_SET_OFFSET: isize = 0x1C;
    const GPIO_MEM_OUTPUT_CLEAR_OFFSET: isize = 0x28;
    const GPIO_MEM_INPUT_LEVEL_OFFSET: isize = 0x34;
    
    fn open_gpio_mem() -> Result<c_int> {
        let filename = CString::new(GPIO_MEM_FILENAME).unwrap();
        let fd = unsafe { open(filename.as_ptr() as *const c_char, O_RDWR | O_SYNC) };
        if fd == -1 { Err(Error::CantOpenGPIOMem) } else { Ok(fd) }
    }
    
    fn map_gpio_mem(fd: c_int) -> Result<*mut c_void> {
        let mem = unsafe {
            mmap(ptr::null_mut(), GPIO_LENGTH, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)
        };
        unsafe { close(fd); }
        if mem == MAP_FAILED { Err(Error::CantMapGPIOMem) } else { Ok(mem) }
    }
    
    fn check_pin_num(pin: u8) {
        if pin > MAX_PIN {
            panic!(format!("Invalid pin number: {}", pin))
        }
    }

    This driver is limited to operating with GPIO using polling only and supporting only 32 out of 54 possible GPIO pins. This is perfectly fine for my purposes though, since it only requires 3 pins: 1 to communicate with DHT-22 and 2 to control relay.

    While writing the driver I had a couple of issues that were caused by my inattention and couldn’t be caught by Rust compiler. The first one was with memory offsets. In the datasheet they were given for the case when memory is accessed per byte; in the driver, however, GPIO memory was mapped into a pointer of u32 type and offset function was used to shift it. Since offset implements C-like pointer arithmetic, the memory was actually shifted 4 times further than it should be. Luckily, it was quickly identified by testing pins with multimeter.

    The second issue was even trickier: the driver spontaneously couldn’t open GPIO memory device. At first I couldn’t find out why it happens: the right privileges were set and I could open the same device using simple Python script, but not the driver. I tried cleaning and rebuilding a couple of times and suddenly it worked. The issue was in careless usage of Rust’s as_ptr method for making a pointer to a constant string containing path to the device. In contrast to C strings, Rust strings are not null-terminated, thus depending on how compiler packed string constants, I sometimes ended up with a path to a non-existent file. Wrapping &'static str into std::ffi::CString saved the day.

    Communicating with a sensor

    Let’s move on to the sensor. DHT-22 implements a dead simple serial communication protocol: according to its datasheet, MCU should pull low corresponding data pin, wait for the acknowledge from a sensor, and then listen for the pulses sent by it. 40 bits of data should be transferred: 16 of humidity, 16 of temperature and 8 bit of checksum. The key thing here is time. If the driver is too late to read sensor’s input, it may end up with incomplete or incorrect data, thus it should somehow keep running for the time of transmission. To prevent driver’s process from being suspended by the system during that time, it’s necessary to raise its priority and avoid using functions that give up CPU to another process. The latter requirement means that std::thread::sleep is forbidden and should be replaced with a busy loop.

    Driver’s implementation consists of dht22::read() function and does the following: it pulls the pin high for half a second, in case it was low after previous transmission; then pulls the pin low for 20 msec, waits for sensor to respond with a low signal, and reads the pulses. 0 is sent as a short high pulse, 1 as a long high pulse. Between each pair of data pulses sensor sends low pulses that should be used as a reference time to separate short and long high pulses. When data is read, driver checks its consistency, processes it and returns to a user.

    extern crate libc;
    
    use self::libc::{sched_get_priority_max, sched_setscheduler, sched_param,
        SCHED_FIFO, SCHED_OTHER, timeval, gettimeofday, time_t, suseconds_t};
    use super::gpio::{self, GPIO};
    use std::result;
    use std::ptr;
    use std::time::Duration;
    
    #[derive(Debug)]
    pub enum Error {
        ReadTimeout,
        DataIsNotValid,
    }
    
    #[derive(Debug)]
    pub struct Reading {
        pub temperature: f32,
        pub humidity: f32,
    }
    
    type Result<T> = result::Result<T, Error>;
    
    pub fn read(gpio: &mut GPIO, pin: u8) -> Result<Reading> {
        let pulses = read_pulses(gpio, pin)?;
        let data = interpret_pulses(pulses);
        if is_checksum_valid(data) {
            Ok(make_reading(data))
        } else {
            Err(Error::DataIsNotValid)
        }
    }
    
    fn read_pulses(gpio: &mut GPIO, pin: u8) -> Result<[(usize, usize); PULSES_NUM]> {
        gpio.set_mode(pin, gpio::Mode::Output);
    
        set_max_priority();
    
        gpio.set_high(pin);
        busy_wait(Duration::from_millis(500));
    
        gpio.set_low(pin);
        busy_wait(Duration::from_millis(20));
    
        gpio.set_mode(pin, gpio::Mode::Input);
    
        let mut wait_count = 0;
        while gpio.read_input(pin) {
            wait_count += 1;
            if wait_count >= TIMEOUT_CYCLES {
                set_default_priority();
                return Err(Error::ReadTimeout)
            }
        }
    
        let mut pulses = [(0, 0); PULSES_NUM];
        for i in 0..PULSES_NUM {
            while !gpio.read_input(pin) {
                pulses[i].0 += 1;
                if pulses[i].0 >= TIMEOUT_CYCLES {
                    set_default_priority();
                    return Err(Error::ReadTimeout)
                }
            }
    
            while gpio.read_input(pin) {
                pulses[i].1 += 1;
                if pulses[i].1 >= TIMEOUT_CYCLES {
                    set_default_priority();
                    return Err(Error::ReadTimeout)
                }
            }
        }
    
        set_default_priority();
        Ok(pulses)
    }
    
    fn interpret_pulses(pulses: [(usize, usize); PULSES_NUM]) -> [i32; BLOCKS_NUM] {
        let threshold= pulses.iter().skip(1).map(|p| p.0).sum::<usize>() / (PULSES_NUM - 1);
        let mut data = [0 as i32; BLOCKS_NUM];
    
        for (i, p) in pulses.iter().skip(1).enumerate() {
            if p.1 > threshold {
                data[i / 8] |= 1 << (7 - (i % 8));
            }
        }
    
        data
    }
    
    fn is_checksum_valid(data: [i32; BLOCKS_NUM]) -> bool {
        let sum = data[0] + data[1] + data[2] + data[3];
        data[4] == sum & 0xFF
    }
    
    fn make_reading(data: [i32; BLOCKS_NUM]) -> Reading {
        let h = data[0] * 256 + data[1];
        let t = (data[2] & 0x7F) * 256 + data[3];
        let t = if data[2] & 0x80 > 0 { -t } else { t };
    
        Reading { temperature: t as f32 / 10.0, humidity: h as f32 / 10.0 }
    }
    
    fn set_max_priority() {
        unsafe {
            let max_priority = sched_get_priority_max(SCHED_FIFO);
            let sched = sched_param { sched_priority: max_priority };
            sched_setscheduler(0, SCHED_FIFO, &sched);
        }
    }
    
    fn set_default_priority() {
        unsafe {
            let sched = sched_param { sched_priority: 0 };
            sched_setscheduler(0, SCHED_OTHER, &sched);
        }
    }
    
    fn busy_wait(duration: Duration) {
        unsafe {
            let secs = duration.as_secs() as time_t;
            let usecs = (duration.subsec_nanos() / 1000) as suseconds_t;
    
            let mut now = timeval { tv_sec: 0, tv_usec: 0 };
            gettimeofday(&mut now, ptr::null_mut());
            let end = timeval { tv_sec: now.tv_sec + secs, tv_usec: now.tv_usec + usecs };
    
            while now.tv_sec < end.tv_sec || (now.tv_sec == end.tv_sec && now.tv_usec < end.tv_usec) {
    	    gettimeofday(&mut now, ptr::null_mut());
            }
        }
    }
    
    const PULSES_NUM: usize = 41;
    const BLOCKS_NUM: usize = PULSES_NUM / 8;
    const TIMEOUT_CYCLES: usize = 64000;

    Exposing everything in a web interface

    Last but not least is how to use the things written above. I chose to make a simple one-page web-server and run it on RPi; here’s how it looks like:

    I don’t think that code for it is worth posting here, but what’s worth is a framework that made it so simple. There are plenty of Rust web frameworks, but I found Rocket to be the most documented and easy-to-use of them all.

    Final thoughts

    This small project was real fun! I learned a lot about electricity, circuits, MCUs and inter-device communication. I also wrote some more or less real-world Rust, and was really surprised how simple it was to program in it and how well it worked on RPi.

  2. Working at foreign startup: lessons learned

    Nov 22, 2016 in Life

    I changed jobs a month ago. The reason was simple: my previous company was no more, has ceased to be, expired and gone to meet its maker. Sure, it happens quite often in a startup world, but this experience was new to me, and frankly, very instructive one. As a note to myself, I’m going to analyse it and make a set of valuable “life lessons” that I’ve learned from it.

    How Did I Get There

    One ordinary November day, almost a year ago, I got a message from an stranger. She said that she worked for a startup that helped companies to find their ideal employees. She was interested in me as a potential candidate because of my GitHub profile and was eager to invite me into their system. To get there I had to fill out my profile and pass a basic Skype interview with a recruiter. After that I got a couple weeks of “being visible”; during that time companies were able to see my profile and send me requests for their own interviews. Since there was nothing for me to lose, I did the steps and forgot about it.

    After a couple of days I got a message from a small startup. They were doing telemedicine for a couple of years now and looked pretty good. At the end of a Skype interview, which was mostly chatting about experience and so on, they asked me whether I liked them and offered me a job. Being upset after getting a rejection from Google a month earlier, I was very glad to get an offer so soon and carelessly accepted it without any bargaining. In addition to modest salary, I also got a pack of stock options. Dreaming of luxury yachts and expensive cars, I started collecting papers necessary for a German visa.

    Lesson: Take your time to make a decision. Get a handful of other options to consider before saying “yes” or “no”.

    Lesson: Evaluate the cost of life and salaries in your field in a country you’re planning to work in before accepting any offers and even negotiating.

    Lesson: Pay attention to the interview process in a company you’re applying to: while hard whiteboard coding interviews won’t guarantee that 100% of company employees are and will be competent, the absence of any technical tasks there raises the risk of black sheep passing through.

    Lesson: Take a stock option offer with a grain of salt: in addition to the weak chances of your new company to become the next Google, there are a lot of tricky clauses and rules which can leave you without desired wealth in a snap of a finger. It’s a nice addition to a decent salary; accepting it as a substitute for a part of your salary is not a good idea though.

    The Best Days

    Fast-forward to April, 2016 when I finally arrived in Berlin. I’ve been working remotely all the time since January, and finally, had an opportunity to meet my new colleagues face to face. On the very first evening I went with some of them to a restaurant near my place and had a good dinner there. Next day I met the rest of the team; they were kind and friendly. So far, so good.

    The real issue I have soon discovered was the absence of somebody taking care of papers and on-boarding. It was unclear who I should have asked about necessary documents and promised phone and gym subscriptions. It was always someone’s side responsibility, and sometimes, people didn’t pay much attention to it. An example of it was how I tried to get paid for the time I worked for the company. Turned out that it was a side-responsibility of the CFO. He gave me a form which I had to fill out so that later he would pass it to the Accounting (the company outsourced it). The form was not the most descriptive one, but I managed to complete it with the help from my fellow developers who have been in the same boat before. I sent it to the CFO right before the end of April and started waiting for my salary. When it didn’t come on time, I asked him. He replied that he forgot to pass the form to the Accounting because he was too busy.

    Lesson: Pay attention to the processes in your company. It’s a management’s job to make sure that company works like clockwork, both externally and internally. Neither the quality of the product, nor the user base nor any other thing can save the company when it is managed badly.

    Apart from this nuisance with getting salary, working there at that time was good. Every week we had company meetings and watched how the product grew and people used it. Though some time ago I decided not to get personally attached to the projects I do on my job anymore, I did have a good feeling that “this thing I’m working on right now is useful to people”.

    So, by the end of June we had a successful launch in Norway and a big feature finished and ready to be rolled out. Also, we were going to move to another office on the last day of June. When I came to the office that day, I thought that we were on a road to success, but half of the team getting fired proved me wrong.

    Lesson: Life is volatile. Even when it seems that everything is better than ever, things can get much worse, and it’s good to be prepared for this turn.

    Struggling and The End

    After those events, work did not progress well: partly because everybody was uncertain about the future, partly because there hasn’t been a new road map yet. We were basically slacking the whole July, drinking beer and hanging out. It started getting a bit better in August, at least on the product side, but still was very volatile. Discoveries of new unpleasant issues on the management and financial sides were made almost every week; they didn’t make us feel better either.

    The pleasant and somewhat funny consequence of this situation was an increased number of various team events. I think that almost any time we had founders in Berlin, we went with them somewhere to eat and drink. In the end it turned out that the company fell apart earlier than the team.

    On the day we discovered that the company is no more, we went for the last team hangout. I got pretty drunk and returned home at 4 a.m. Two days later I flew to Italy on a vacation, and after that, started looking for a new job. But that is a different story.

  3. ghcid: must-have tool for Haskell developer

    Aug 18, 2016 in Haskell, Programming

    Usually when programming in Haskell, I rely a lot on compiler. It catches many errors that are caused by inattention, which occur especially often when refactoring large portions of code. In some sense famous “it compiles, therefore it works” mantra is true (assuming that you don’t have logical errors). The problem with this approach for me for a long time was that I ran compiler manually and then read its output. The output was usually quite long, and I had to read it and start fixing errors from the top, otherwise I might end up fixing wrong error. It was a tedious routine, even with a help of bash scripts and file watchers, but thanks to ghcid1, I am free from it now.

    ghcid can be interpreted as “GHCi-daemon” or “GHC-with-a-bit-of-IDE”. It is a rather small program which starts GHCi, imports your modules there, executes some Haskell expression in it, watches your source code and fires :reload and reevaluates expression once something is changed. It also captures warnings and errors produced by compiler, and presents them in a way where you can see the topmost ones right away, without scrolling the whole output to the top. Last but not least is that it plays really well with stack, which is de-facto a standard tool for building and testing Haskell projects nowadays.

    How to use it

    First, install it:

    $ stack install ghcid
    

    Now you can run it with stack exec. At this point I usually make a bash script and tune ghcid the way I need it for my application.

    First of all we need to setup is the expression which will be evaluated in GHCi session. You can do it with -T flag. If you want to run something different than GHCi, you can pass the command to -c flag. By default ghcid detects whether you’re using stack or just plain cabal and runs either stack repl or cabal repl respectively.

    Another thing I usually turn on for my projects are -Wall -Werror compiler flags. It catches such errors as redundant imports, orphan instances, unused variables and helps you keep the codebase clean. However, when working in a REPL mode, you’d usually concentrate on the big picture and leave those warnings to be fixed later, e.g. before committing code. When -W option is passed ghcid doesn’t take those errors into account and evaluate given expression anyway.

    By default ghcid watches only for those source files that were loaded by GHCi. Sometimes it might not be enough. For example, now I’m working on application which has Hamlet templates in templates directory and I’d like to reload the app each time some of templates is changed. Luckily it’s easy to make ghcid do so with --restart=<PATH> or --reload=<PATH> options. The first one will rebuild the project and restart the whole GHCi process while the second one will just fire :reload command to existing GHCi process whenever any of the files and directories in <PATH> is changed.

    Issues and ways to improve

    One thing that would be nice to have is errors highlighting. Sure they are printed in bold font now, but having line numbers and error messages in different colors would help even more. Right now it is possible to achieve by using ghcid inside terminal emulator in Neovim or Emacs.

    Currently highlighting is the only thing that comes to my mind. Overall ghcid is a great development tool, and even minor bugs (usually caused by GHCi) can’t really outweigh its advantages. Definitely must-have.


  4. Algorithms 101: Sorting

    Aug 12, 2016 in Algorithms

    This article will cover several sorting algorithms: working in quadratic time (bubble, insertion and selection sort), in loglinear time (merge sort, quicksort and heap sort) and in linear time, but only for special data types (counting and radix sort). Source code of examples used in this article is available on Github.

    Quadratic time algorithms

    There is not much to say about quadratic time algorithms. Usually, they are not used by their own, but with some loglinear algorithm instead. For example, insertion sort sometimes is used inside merge sort or quicksort to sort small subarrays. As for bubble sort, usually it is considerably slower that other quadratic time sort algorithms and not used in practice.

    Bubble sort

    Idea: repeatedly iterate over an array comparing items pairwise and exchanging if they are out of order. It is easy to notice that on each step the biggest item will be placed at the end of an array, thus at most n iterations is needed to put all items in their places. This algorithm is stable.

    package com.dancingrobot84.algorithms.sorting;
    
    import com.dancingrobot84.algorithms.util.ArraysExt;
    
    public class BubbleSort {
        public static void sort(int[] array) {
            boolean isSorted;
            do {
                isSorted = true;
                for (int i = 0; i < array.length - 1; i++) {
                    if (array[i] > array[i+1]) {
                        ArraysExt.swap(array, i, i+1);
                        isSorted = false;
                    }
                }
            } while (!isSorted);
        }
    }

    Insertion sort

    Idea: iterate over an array keeping its left side sorted. When next item is encountered, push it further to the left until it is less or equal than other items. There are two loops, each of at most n iterations, thus quadratic time. This algorithm is stable.

    package com.dancingrobot84.algorithms.sorting;
    
    import com.dancingrobot84.algorithms.util.ArraysExt;
    
    public class InsertionSort {
        public static void sort(int[] array) {
            for (int i = 1; i < array.length; i++) {
                for (int j = i; j > 0 && array[j] < array[j-1]; j--) {
                    ArraysExt.swap(array, j, j-1);
                }
            }
        }
    }

    Selection sort

    Idea: iterate over all positions in array. For each position i search for minimal item in the range of [i, n] and put it in position i. For each iteration searching for minimal item will take O(n) time, thus giving quadratic time overall. This algorithm can be implemented as stable, however, the implementation below is not stable.

    package com.dancingrobot84.algorithms.sorting;
    
    import com.dancingrobot84.algorithms.util.ArraysExt;
    
    public class SelectionSort {
        public static void sort(int[] array) {
            for (int i = 0; i < array.length; i++) {
                int curMinIndex = i;
                for (int j = i; j < array.length; j++) {
                    if (array[j] < array[curMinIndex]) {
                        curMinIndex = j;
                    }
                }
                ArraysExt.swap(array, curMinIndex, i);
            }
        }
    }

    Loglinear time algorithms

    Loglinear algorithms are the ones that widely used in practice. It is proven that it is not possible to sort arbitrary data using comparisons faster than in O(nlog(n)) time, so these algorithms are the fastest ones. However, which algorithm to use in a program depends on its goals and constraints.

    Merge sort

    Idea: (top-down merge sort) split array in halves, sort each half recursively and then merge them. Using master theorem, T(n) = 2T(n/2) + O(n) = O(nlog(n)). This algorithm is stable.

    Bottom-up merge sort treats n-element array as n singleton arrays. It merges them pairwise producing n/2 two-element arrays, merging them again and again until it merges two n/2-element arrays into one. It can be implemented without recursion.

    The implementation below use O(n) space for buffer array which is used for merging two subarrays. This is standard recursive top-down merge sort.

    package com.dancingrobot84.algorithms.sorting;
    
    import java.util.Arrays;
    
    public class MergeSort {
        public static void sort(int[] array) {
            new Sorter(array).sort();
        }
    
        private static class Sorter {
    
            private final int[] originalArray;
            private final int[] buffer;
    
            public Sorter(int[] array) {
                originalArray = array;
                buffer = new int[array.length];
            }
    
            public void sort() {
                sort(0, originalArray.length-1);
            }
    
            private void sort(int start, int end) {
                if (start >= end) {
                    return;
                }
    
                int mid = (start + end) / 2;
                sort(start, mid);
                sort(mid+1, end);
                merge(start, end);
            }
    
            private void merge(int start, int end) {
                int mid = (start + end) / 2;
                int left = start;
                int right = mid+1;
                int cur = 0;
    
                for (; left <= mid && right <= end; cur++) {
                    if (originalArray[left] < originalArray[right]) {
                        buffer[cur] = originalArray[left];
                        left++;
                    } else {
                        buffer[cur] = originalArray[right];
                        right++;
                    }
                }
    
                for (; left <= mid; cur++, left++) {
                    buffer[cur] = originalArray[left];
                }
    
                for (; right <= end; cur++, right++) {
                    buffer[cur] = originalArray[right];
                }
    
                System.arraycopy(buffer, 0, originalArray, start, cur);
            }
        }
    }

    Merge sort is easily parallelizable algorithm. It is more efficient than quicksort on linked lists where it can be implemented using only constant space.

    Quicksort

    Idea: split array in two parts by comparing each element to the pivot, then recursively sort these parts. Because quicksort depends on the contents of array, it is possible to construct such array that will force this algorithm to work quadratic time. Though this problem is easily solved by choosing random element, to be precise it is necessary to say that this algorithm has O(nlog(n)) time complexity in the average case and O(n^2) time complexity in the worst case. Space complexity is O(log(n)) in the average case and O(n) in the worst case since this algorithm is recursive.

    Another delicate part of quicksort is how to partition array in constant space. In order to do that create hiStart pointer which will store index of the beginning of the second part of the array. Then iterate over the array comparing items to pivot: if current item is less than pivot, swap it with the item at hiStart and increment hiStart.

    To speed up quicksort in the situation of repeated items one could implement splitting into three parts: less than, equal to and greater than pivot. It requires one more pointer, loEnd which will store index of the end of the first part.

    The implementation below chooses random pivot and uses two-way partition. It is not stable, however, it is possible to make quicksort stable, but it requires additional O(n) space. This implementation use O(1) space.

    package com.dancingrobot84.algorithms.sorting;
    
    import com.dancingrobot84.algorithms.util.ArraysExt;
    import java.util.Arrays;
    import java.util.Random;
    
    public class QuickSort {
        public static void sort(int[] array) {
            sort(array, 0, array.length-1);
        }
    
        private static Random generator = new Random();
    
        private static void sort(int[] array, int start, int end) {
            if (start >= end) {
                return;
            }
    
            int pivotIndex = partition(array, start, end);
            sort(array, start, pivotIndex);
            sort(array, pivotIndex+1, end);
        }
    
        private static int partition(int[] array, int start, int end) {
            int pivotIndex = start + generator.nextInt(end - start);
            int pivot = array[pivotIndex];
            ArraysExt.swap(array, pivotIndex, end);
    
            int hiStart = start;
            for (int i = start; i < end; i++) {
                if (array[i] < pivot) {
                    ArraysExt.swap(array, hiStart, i);
                    hiStart += 1;
                }
            }
    
            ArraysExt.swap(array, hiStart, end);
            return hiStart;
        }
    }

    Quicksort has better constant than merge sort and heap sort. When implemented well it could outperform its competitors drastically. It is possible to parallelize quicksort, but it is a bit harder to do than in case of merge sort.

    Heap sort

    Idea: build a heap on input array, then iteratively take minimal element out of it until there is no items left. Since building a heap has O(n) time complexity and retrieving minimal element from it requires O(log(n)) time overall time complexity for this sorting algorithm is O(nlog(n)). This algorithm is not stable.

    This idea could be used on any data structure which supports building in O(nlog(n)) or faster and retrieving minimum/maximum in O(log(n)). Heap is preferred because it is possible to build heap on the same input array, thus spending only O(1) space.

    package com.dancingrobot84.algorithms.sorting;
    
    import com.dancingrobot84.algorithms.datastructures.Heap;
    import com.dancingrobot84.algorithms.datastructures.impl.BinaryHeap;
    import com.dancingrobot84.algorithms.util.ArraysExt;
    import java.util.Arrays;
    
    public class HeapSort {
        public static void sort(Integer[] array) {
            Heap<Integer> heap = BinaryHeap.fromArray(array);
            int i = array.length - 1;
            while (heap.size() > 0) {
                array[i] = heap.pop();
                i--;
            }
            ArraysExt.reverse(array);
        }
    }

    Linear time algorithms

    Linear time sorting algorithms do not compare elements or they would work in O(nlog(n)). Instead, they either require some constraints on input or use some knowledge about items of an array. They usually used in complex algorithms to cut some time here and there, e.g. building suffix arrays or for sorting data on parallel clusters.

    Counting sort

    Assumption: all items of input array belong to some finite set of items or could be uniquely associated with a key from this finite set. Denote the size of such set as k.

    Idea: iterate over input array counting how many times each distinct item occurs here, then using this information put items in their places. To store occurences we need array of integers of length k and output array of length n, thus we need O(n+k) space. Both of these array are traversed during the execution of algorithm, so its time complexity is O(n+k) too.

    package com.dancingrobot84.algorithms.sorting;
    
    import java.util.Arrays;
    
    public class CountingSort {
        public static final int K = 100;
    
        public static void sort(int[] array) {
            int[] counter = new int[K];
            for (int item: array) {
                counter[item]++;
            }
    
            int totalCount = 0;
            for (int i = 0; i < K; i++) {
                int oldCount = counter[i];
                counter[i] = totalCount;
                totalCount += oldCount;
            }
    
            int[] arrayCopy = Arrays.copyOf(array, array.length);
            for (int item: arrayCopy) {
                array[counter[item]] = item;
                counter[item]++;
            }
        }
    }

    Radix sort

    Assumption: each item from input array should be represented in positional notation, i.e. as a collection of characters from finite alphabet of size k where the power of each character depends on where it is placed within a collection. Examples of positional notation are binary and decimal numbers and strings.

    Idea: sort input items by a single character in fixed position for all possible positions, e.g. sort by 1st character, then by 2nd, continue until all m positions were used to sort items. Sorting at specified position could be performed using counting sort since alphabet is finite, therefore it takes O(m(n+k)) time and O(mn) space to sort the whole array.

    Depending on the starting position radix sort could be LSD (least significant digits first) or MSD (most significant digits first). The first one is used to sort in representation order (correct for decimals), the second one is for sorting in lexicographic order. The implementation below is LSD radix sort.

    package com.dancingrobot84.algorithms.sorting;
    
    import java.util.Arrays;
    
    public class RadixSort {
        public static final int K = 2;
        public static final int M = 32;
    
        public static void sort(int[] array) {
            int[] temp = array;
            for (int i = 0; i < M; i++) {
                temp = sort(temp, i);
            }
            System.arraycopy(temp, 0, array, 0, array.length);
        }
    
        private static int[] sort(int[] array, int bit) {
            int[] counter = new int[K];
            for (int i = 0; i < array.length; i++) {
                counter[getBit(array[i], bit)]++;
            }
    
            int totalCount = 0;
            for (int i = 0; i < K; i++) {
                int oldCount = counter[i];
                counter[i] = totalCount;
                totalCount += oldCount;
            }
    
            int[] result = new int[array.length];
            for (int item: array) {
                result[counter[getBit(item, bit)]] = item;
                counter[getBit(item, bit)]++;
            }
            return result;
        }
    
        private static int getBit(int n, int k) {
            return (n >> k) & 1;
        }
    }

    Algorithms comparison chart

    Algorithm \ Property Avg. time Worst time Avg. space Stability Generality
    Bubble sort O(n2) O(n2) O(1) Yes Yes
    Insertion sort O(n2) O(n2) O(1) Yes Yes
    Selection sort O(n2) O(n2) O(1) Yes Yes
    Merge sort O(nlog(n)) O(nlog(n)) O(n) Yes Yes
    Quicksort O(nlog(n)) O(n2) O(log(n)) No Yes
    Heap sort O(nlog(n)) O(nlog(n)) O(1) No Yes
    Counting sort O(n+k) O(n+k) O(n+k) No No
    Radix sort O(m(n+k)) O(m(n+k)) O(mn) Yes No

    1. Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms

    2. Sorting - topcoder

  5. Algorithms 101: Basic Data Structures Pt. 2

    Aug 12, 2016 in Algorithms

    This article will cover another bunch of must-know data structures, such as heaps, priority queues, associative arrays and hashes. Source code of examples used in this article is available on Github.

    Heaps

    Heap is a tree-like data structure satisfying the following property: the key of parent node is ordered with respect of the key of child node. According to the order, heaps can be divided into max heaps (parent key is greater than the chilld’s one) and min heaps (parent key is lesser than the child’s one). According to the structure of an underlying tree heaps can be binary, d-ary, binomial, fibonacci and many others. The following interface is used to represent heaps:

    package com.dancingrobot84.algorithms.datastructures;
    
    public interface Heap<T extends Comparable<? super T>> {
        /** Get top item without removing it from heap **/
        public T peek();
    
        /** Get top item and remove it from heap **/
        public T pop();
    
        /** Return number of items in heap **/
        public int size();
    
        /** Insert new item into heap **/
        public void insert(T item);
    }

    Heaps are usually used to implement priority queues, which are used in some graph algorithms (Prim, Dijkstra). Heapsort sorting algorithm is another example of heap’s usage.

    AFAIK there is no particular implementation of heap as data structure in Java and C++ languages. However, there is std::make_heap function in C++ standard library which rearranges elements in given container in a way that they form a heap. As for Java, there is PriorityQueue class which uses heap under the hood.

    Binary min heap implementation

    Idea: this implementation does not have explicit tree structure. Instead, array is used to store nodes. Index of given node is used to identify its parent and children (getParent, getLeftChild and getRightChild methods). The image below gives an example of how to store binary heap of 7 items in an array.

    Binary tree stored in array

    The most important part of heap implementation is methods for maintaining heap property. In case of binary heap there are two of such methods: bubbleUp and bubbleDown. bubbleUp compares given element and its parent and if they violate heap property, exhanges them and moves up the tree. bubbleDown does the same thing, only for the children of the given element and moves down the tree. The first method is used to implement insert: we put new item at the end of an array and run bubbleUp on it in order to put it in its place. The second method is used to implement popMinimum: we exchange the top item with the last one and run bubbleDown on the top item in order to put it in its place.

    Now the last thing to do is how to heapify given array efficiently. Naive approach is to create an empty heap and perform n insertions, which gives us O(nlog(n)) time and O(n) space. However, it is possible to do it in O(n) time and O(1) space using bubbleDown method: just iterate over the array in reverse order and run bubbleDown on each item.

    package com.dancingrobot84.algorithms.datastructures.impl;
    
    import com.dancingrobot84.algorithms.datastructures.Heap;
    import com.dancingrobot84.algorithms.util.ArraysExt;
    
    public class BinaryHeap<T extends Comparable<? super T>> implements Heap<T> {
    
        public static <T extends Comparable<? super T>> Heap<T> emptyWithCapacity(int capacity) {
            return new BinaryHeap<T>(capacity);
        }
    
        public static <T extends Comparable<? super T>> Heap<T> fromArray(T[] array) {
            return new BinaryHeap<T>(array);
        }
    
        @Override
        public T peek() {
            assert size() > 0 : "Heap is empty";
            return array[0];
        }
    
        @Override
        public T pop() {
            T min = peek();
            ArraysExt.swap(array, 0, lastPos);
            lastPos -= 1;
            bubbleDown(0);
            return min;
        }
    
        @Override
        public int size() {
            return lastPos + 1;
        }
    
        @Override
        public void insert(T item) {
            assert size() < capacity() : "Heap is full";
            lastPos += 1;
            array[lastPos] = item;
            bubbleUp(lastPos);
        }
    
        public int capacity() {
            return array.length;
        }
    
        private final T[] array;
        private int lastPos;
    
        @SuppressWarnings("unchecked")
        private BinaryHeap(int capacity) {
            array = (T[])new Comparable[Math.max(capacity, 0)];
            lastPos = -1;
        }
    
        private BinaryHeap(T[] originalArray) {
            array = originalArray;
            lastPos = array.length - 1;
            for (int i = lastPos; i >= 0; i--) {
                bubbleDown(i);
            }
        }
    
        private void bubbleDown(int pos) {
            if (pos < 0 || pos > lastPos) {
                return;
            }
    
            int minPos = pos;
    
            int leftPos = getLeftChild(pos);
            if (leftPos <= lastPos && array[minPos].compareTo(array[leftPos]) == 1) {
                minPos = leftPos;
            }
    
            int rightPos = getRightChild(pos);
            if (rightPos <= lastPos && array[minPos].compareTo(array[rightPos]) == 1) {
                minPos = rightPos;
            }
    
            if (minPos != pos) {
                ArraysExt.swap(array, pos, minPos);
                bubbleDown(minPos);
            }
        }
    
        private void bubbleUp(int pos) {
            int parentPos = getParent(pos);
            if (pos < 0 || pos > lastPos || parentPos < 0 || parentPos > lastPos) {
                return;
            }
    
            if (array[parentPos].compareTo(array[pos]) == 1) {
                ArraysExt.swap(array, parentPos, pos);
                bubbleUp(parentPos);
            }
        }
    
        private int getLeftChild(int parentPos) {
            return 2 * parentPos + 1;
        }
    
        private int getRightChild(int parentPos) {
            return 2 * parentPos + 2;
        }
    
        private int getParent(int childPos) {
            childPos = childPos % 2 == 0 ? childPos - 2 : childPos - 1;
            return childPos / 2;
        }
    }

    isEmpty and getMinimum have O(1) time complexity. Because both bubbleUp and bibbleDown take O(nlog(n)) time, insert and popMinimum methods have the same time complexity. As for Heap(int[]) ctor: though it is just n runs of bubbleDown which sholud give us O(nlog(n)) time, actually, it works faster than that, because most of heapification takes place in lower levels. Complete proof of its time complexity can be found on Wikipedia.

    Priority Queues

    Priority queue is abstract data structure similar to stack or queue, but the order of items retrieving in it depends on items’ priorities. Usually item with the highest priority is served first.

    Priority queues applicable in many popular algorithms: Dijkstra, Prim, Huffman encoding, etc. There are implementations of priority queues in C++ and Java: std::priority_queue and PriorityQueue classes respectively.

    Implementation of priority queue using binary min heap

    No additional comments as implementation is quite straightforward.

    package com.dancingrobot84.algorithms.datastructures;
    
    public class PriorityQueue<T extends Comparable<? super T>> {
        private final Heap<T> heap;
    
        public PriorityQueue(int capacity) {
            heap = BinaryHeap.<T>emptyWithCapacity(capacity);
        }
    
        public void insert(T item) {
            heap.insert(item);
        }
    
        public T pop() {
            return heap.pop();
        }
    }

    Associative arrays

    Associative array or map is an abstract data structure. It stores key-value pairs in such a way, that each possible key appears only once. To describe associative array the following interface is used:

    package com.dancingrobot84.algorithms.datastructures;
    
    public interface Map<K extends Comparable<? super K>, V> {
        /** Return value associated with a key or null otherwise **/
        public V get(K key);
    
        /** Insert value into the map under specified key or rewrite if such key already exists **/
        public void put(K key, V value);
    }

    Associative arrays are applicable in great variaties of computer tasks: removing duplicates, counting items, etc. There are even specials databases called “key-value storage” which store key-value pairs either in memeory or on disk (Redis, Memcached, etc).

    Hash Tables

    Hash table is one of data structures implementing associative array. It uses array to store key-value pairs and special hash function to compute index of particular pair by its key. Usually, for implementation of map it performs better than binary search tree (both put and get work in O(1) amortized comparing to O(log(n)) in BST), but it is highly dependent on choosing the right hash function, load factor and collision resolution method.

    Hash tables are used to implement maps, sets, database indexes and caches. They are available in almost any modern programming language. Some languages provide them as syntactic construct (Python, Ruby), others have specific classes in standard libraries (std::unordered_map in C++ and HashMap in Java).

    Implementation of hash table

    Idea: hash table is always implemented as underlying array containing entries. Depending on collision resolution method, entries could be either key-value pairs or their chains. There are several methods for collision resolution; the most popular ones are open adressing and chaining.

    Open addressing method implements the following strategy: when collision occurs during insertion, try to find a different place in array for particular key by sequential probing of other places until the free one is found. When searching for a value, pairs are traversed in exactly the same way. There are several ways of choosing which places to look at when searching for the free one; the easiest method, called “linear probing”, just looks at the place following the current one.

    Chaining method solves the collision problem differently. It stores buckets of pairs in underlying array; when collision occurs, new pair is dropped in associated bucket. When searching for a value, bucket is searched. Usually, buckets are implemented as linked lists of pairs. The implementation below uses this method.

    Another important part of hash table implementation is choosing the right hash function. The main requirement of hash function is uniform distribution of its values. The uniformity is crucial; otherwise the number of collisions rises causing bad performance of hash table. Though, for particular implementation this condition can be loosen to uniformity at specific sizes. The implementation below relies on hashCode function of Java’s Object for simplicity and performs no additional hashing. It may affect performance as programmers writing classes are not obliged to guarantee uniformity of hashCode.

    As the number of items in hash table grows, the chance of collision rises too; that’s why it is necessary to resize underlying array and re-insert all key-value pairs sometimes. To determine the time for performing this action, load factor indicator is used. It is calculated as the number of key-value pairs inserted divided by the size of underlying array. As for the implementation below, load factor of 0.75 indicates that resizing is needed; the number was chosen manually, though in real world you may want to perform some experiments to choose it more precisely.

    package com.dancingrobot84.algorithms.datastructures.impl;
    
    import com.dancingrobot84.algorithms.datastructures.Map;
    
    public class HashMap<K extends Comparable<? super K>, V> implements Map<K, V> {
    
        public static final int DEFAULT_CAPACITY = 2;
    
        public static <K extends Comparable<? super K>, V> Map<K, V> empty() {
            return new HashMap<K, V>(DEFAULT_CAPACITY);
        }
    
        public static <K extends Comparable<? super K>, V> Map<K, V> emptyWithCapacity(int capacity) {
            return new HashMap<K, V>(capacity);
        }
    
        @Override
        public V get(K key) {
            checkKey(key);
            Entry<K, V> entry = entries[findIndex(key)];
            for (; entry != null; entry = entry.next) {
                if (key.compareTo(entry.key) == 0) {
                    return entry.value;
                }
            }
            return null;
        }
    
        @Override
        public void put(K key, V value) {
            checkKey(key);
            int index = findIndex(key);
            Entry<K, V> entry = entries[index];
            for (; entry != null; entry = entry.next) {
                if (key.compareTo(entry.key) == 0) {
                    entry.value = value;
                    return;
                }
                if (entry.next == null) {
                    break;
                }
            }
    
            Entry<K, V> newEntry = new Entry<K, V>(key, value);
            if (entry == null) {
                entries[index] = newEntry;
            } else {
                entry.next = newEntry;
            }
            size++;
    
            if (isFull()) {
                resize(entries.length * 2);
            }
        }
    
        private int size;
        private Entry<K, V>[] entries;
    
        @SuppressWarnings("unchecked")
        private HashMap(int capacity) {
            size = 0;
            entries = (Entry<K, V>[])new Entry[Math.max(capacity, DEFAULT_CAPACITY)];
        }
    
        private static class Entry<K, V> {
            public K key;
            public V value;
            public Entry<K, V> next;
    
            public Entry(K key, V value) {
                this.key = key;
                this.value = value;
            }
        }
    
        private void checkKey(K key) {
            assert key != null : "Nullary keys are forbidden";
        }
    
        private int findIndex(K key) {
            return key.hashCode() % entries.length;
        }
    
        private boolean isFull() {
            return size >= 0.75 * entries.length;
        }
    
        @SuppressWarnings("unchecked")
        private void resize(int newCapacity) {
            Entry<K, V>[] oldEntries = entries;
            entries = (Entry<K, V>[])new Entry[newCapacity];
            size = 0;
    
            for (int i = 0; i < oldEntries.length; i++) {
                for (Entry<K, V> entry = oldEntries[i]; entry != null; entry = entry.next) {
                    put(entry.key, entry.value);
                }
            }
        }
    }

    1. Cormen, Leiserson, Rivest, Stein. Introduction to Algorithms