How two Uni Students built a better Census site in just 54 hours for $500 – EFTM
Screen Shot 2016-08-16 at 6.38.58 AM

How two Uni Students built a better Census site in just 54 hours for $500

$500 and it's rated to handle four times the traffic the ABS were ready for.

One of the fundamental problems with the 2016 online census was the architecture.  Not the building the ABS works in, but the way the computer system built to handle millions of Australians was designed.  Turns out two uni students designed a better way to do it in just 54 hours on the weekend – at a cost of just $500.

If there’s one thing a computer programming student loves, it’s a hack-a-thon.  Now, for the uninitiated, this is not an event where smart people hack innocent people’s computers over and over again – it’s a concentrated period of time within which teams are required to come up with an idea and build it.

Pizza is a vital ingredient, as is a lack of sleep.

Screen Shot 2016-08-16 at 6.39.35 AM

But for two Queensland first-year uni students, the idea was simple – Make Census Great Again.

Austin Wilshire and Bernd Harzer are both from the Queensland University of Technology. Austin studying IT, Majoring in Computer Science, while Bernd is studying Creative Industries and Information Technology.

Screen Shot 2016-08-16 at 6.38.08 AM

They teamed up and set to work on their Trump-like goal for the failed 2016 Census.

And their approach – vastly different to the ABS and their contracted developer IBM.

Scale.  That’s right, Austin and Bernd wanted to design for scale.

The traditional approach to designing web services is “on-premise” – this means that somewhere there are a bunch of computers all built to serve up the content – in this case, census forms.  This is what IBM and the ABS did with the actual Census.

But at the Code Network “winter hack-a-thon” on the weekend, these two smart cookies went for a “cloud-first” design which can quite simply “infinitely scale”.

What this means is, you use a service like AWS (Amazon Web Services) and the software is built to simply grow, as load increases, it re-deploys itself to continually be able to cope with the demand.

Think about it – does Amazon.com go down often?

Screen Shot 2016-08-16 at 6.39.13 AM

They built the site, and even “load tested” it – remember the ABS spent almost half-a-million dollars on Load testing their failed site?  In addition to the $9.6million to design and build it?

On the weekend “Make Census Great Again” was load tested to 4 million page views per hour.  And 10,000 submissions per second – insane numbers.

The ABS proposed and tested their site for 1 million per hour. The magic “260 submissions per second” they keep banging on about.  Their testing? $469,000.  Testing for “Make Census Great Again” – $0.

That’s right, there are open source (ie: Free) load testing solutions out there, which – ironically, were also designed in just two days like this very project.

Screen Shot 2016-08-16 at 6.38.51 AM

Austin Wilshire & Bernd Hartzer

How would it cope with a Denial of Service attack though?  “Fine” – “it would have racked up a bill, but it would have survived”

And that bill – nothing compared to the budget the ABS has spent on Census 2016.

This proves a couple of things.  Firstly, innovation is alive and well in Australia.

Secondly, Governments have a habit of over-engineering everything, and it’s that simple thing which ruined Census 2016.

$500.  And 54 hours of development time by two young first-year Uni Students.  Take that Malcolm Turnbull – Take that ABS.  Perhaps worst of all – take that IBM.

54 hours? That’s the time allowed at the Code Network Hack-a-thon, it’s also pretty damn close to the amount of time the Census site was down for too.

Screen Shot 2016-08-16 at 6.38.58 AM

From left to Right – Adam Hibble (me), Bernd Hartzer, Austin Wilshire, Peter Laurie (Judge), Mike Ciavarella (Judge)

For the record, Code Network is a volunteer student-run organisation based at the QUT.  It was founded last year and it’s aim is to help produce the best software developers on the planet and has 1500 members.

We all know who to ask for help on the next big government project don’t we.

As for Austin and Bernd – they won a Microsoft Surface Pro 4 donated by event sponsor Technology One.

Web: Make Census Great Again

Photos by Mathew Taylor

 

Categories
Tech

Trevor produces two of the most popular technology podcasts in Australia, Your Tech Life and Two Blokes Talking Tech. He has a weekly radio show on 2UE, as well as appearances across the country and regularly provides Technology Commentary to Channel 9’s Today Show and A Current Affair. Father of three, he is often found down in his Man Cave. Like this post? Buy Trev a drink!
39 Comments on this post.
  • HH
    16 August 2016 at 2:54 pm
    Leave a Reply

    Good on them for working on this. However, I feel the article will do more harm than good by trivialising the effort required to build scalable systems. It’s neither easy nor it will be cheap.

    • Trevor Long
      17 August 2016 at 12:24 am
      Leave a Reply

      Nope, not cheap. But lets just assume for a moment that it was worth investigating, trying and potentially spending the SAME amount they did but on something that WORKED

      • HH
        17 August 2016 at 4:33 pm
        Leave a Reply

        I agree that it’s worth investigating, and that perhaps there are far better choices than IBM. However, the comparison is not fair since it’s not apple-to-apple.

        As a disclaimer, I am not affiliated with IBM, its subsidiaries, nor the Census project. I am however, making a living working on large and scalable web systems (although not Twitter or Facebook-scale).

        Note that I can’t find the source code. Normally, they should be publicly available as part of a hack-a-thon, so my comments below are based on just inspecting the HTML source code and its public network traffic of the site linked in the article (http://makecensusgreatagain.com/). My opinion might change if I have access to the source code.

        The claimed handling (x) times more load does not hold water, simply because it is not grounded in reality.

        TL,DR; The primary reason why it can handle much more traffic as opposed to the Census website is because it doesn’t seem to be doing much at all.

        The front-end looks to be a static HTML site hosted from AWS S3 bucket, with a number of assets leeched from the Census CDN. This form simply does JSON AJAX POST submission and redirect once a response is received.

        The back-end form processing looks to be a NodeJs application (through hints on the endpoint address, but can’t be 100% sure), with an AWS CloudFront for caching. I can’t tell what kind of persistence technologies they are using or even if they are using any at all.

        Using static HTML front-end and/or heavy caching is a good strategy to increase throughput. However, the student’s project does not deliver various functionalities often needed with lengthy / multi-page form websites such as the Census website, for example:

        – session management, e.g. I got disconnected and want to resume from last point
        – conditional form branching, e.g. if I answer A, I can skip section 5
        – data validation, e.g. date must be in dd/mm/yyyy format

        Once you added the above and some other essential functionalities, you’ll see the performance drops significantly, mainly because the web server will have to do a lot more work other than passing data to store.

        In addition, most of the time, the developers do not have full control over what kind of infrastructure or other piece of software they can deploy their code to. Other commenters have mentioned restrictions due to regulation, auditability, available commercial support, and there might also be many client-imposed constraints.

        Pile on constraints after constraints (they can all be valid constraints, mind you), various parts of the system can easily become a bottleneck, and delivering scalable systems becomes a really tough job.

        I hope you get my point.

        PS. Note for the students: simply absorbing DDoS traffic is not a good strategy because it’s far cheaper for the attacker to launch, increase, and sustain the attack volume than for the client to absorb the increasing attack (inbound) traffic. Hence, mitigation strategy at the network layer is really important, which is why IBM and ABS’s biggest blunder was to reject this offer, IMO.

  • Sandeep
    16 August 2016 at 2:55 pm
    Leave a Reply

    Governments do not over engineer. In this case it’s a woeful example of incompetence! Since this is Australia it’s hard to fathom corruption, bribery or underhandedness but certainly this whole thing stinks of a pitiful attempt at something they knew the scale of by its very definition! Its the census for god’s sake.

    • Trevor Long
      17 August 2016 at 12:23 am
      Leave a Reply

      Spot on – SCALE – they SHOULD HAVE KNOWN

  • Craig James
    16 August 2016 at 6:24 pm
    Leave a Reply

    I don’t think any integrator will want to touch Census 2021. Everyone is an expert / tall poppy syndrome continues down under…

    • Trevor Long
      17 August 2016 at 12:23 am
      Leave a Reply

      Forget the integrators, no politician would touch it

  • Eddie
    16 August 2016 at 8:45 pm
    Leave a Reply

    While that is possible it solution lacks the operational support necessary for the overall solution once it has gone live. On the other hand, the solution will not meet privacy act and national archive act standards which are the real reasons why IBM didn’t use widely accepted technologies to cater for this load.

    • Trevor Long
      17 August 2016 at 12:22 am
      Leave a Reply

      No, the reason they didn’t do this or similar is they are lazy and illprepared

  • kirb
    16 August 2016 at 9:32 pm
    Leave a Reply

    Kudos to them for winning, but that’s some… interesting code. It seems extremely boilerplate (made from a template); has a fraction of the fields; doesn’t validate fields (I can click Submit without having entered anything); backend seems to likely be some boilerplate/sample code. Not trying to knock them or make myself seem better… just seems strangely cobbled together even for a hackathon. These would no doubt add more necessary processing on the server, skewing their load test results.

    • Trevor Long
      17 August 2016 at 12:22 am
      Leave a Reply

      Point missed – this was about scalable load, and affordable load testing. Proving that the ABS could have and should have done more to find an agile approach to the load and security approach.

  • Casey
    16 August 2016 at 10:16 pm
    Leave a Reply

    Fortunately for very valid security reasons we have legislation in place that prevents private citizen data from being hosted (or even passing through) unsecured data centres, let alone foreign hosted data centres. hackathons are great, but thinking they’re useful for anything beyond producing proof of concepts is bad reporting. One of the key features of cloud data centres is that they’re shared, when you’re not using the compute power someone else is, and having someone else running code on the same metal that the census was running on is a risk no security expert in the right mind would be willing to accept. A private cloud (which is probably what they would have run) is about as good as you could go.

    • Trevor Long
      17 August 2016 at 12:21 am
      Leave a Reply

      With five years and hundreds of millions of dollars, the ABS should have found a way to do just that.

    • Doobs
      17 August 2016 at 6:50 pm
      Leave a Reply

      This! The article trivialises the constraints of collecting and storing private census data..

      Anyone can create a form and host it on AWS – I’m sure it was one of the first ideas tabled at a scoping session.

      It seems convenient to ignore all that to say “Hey wow, look a bunch of kids did better than the government” to generate clicks.

      • Trevor Long
        17 August 2016 at 8:12 pm
        Leave a Reply

        No, this article starts the conversation.

        If a scalable cloud environment could enable online activity like this, how do we as a nation work to build or certify such an environment to deal with our requirements? Why are we just brushing these ideas aside – we have to move forward, the Census set us back a decade

  • [email protected]
    16 August 2016 at 10:48 pm
    Leave a Reply

    Good job but you’re hosting every Australian citizens private data overseas, on US hardware run by a US company heavily intercepted by NSA.

    That wasn’t an option ABS.

    • Trevor Long
      17 August 2016 at 12:20 am
      Leave a Reply

      Good job jumping to conclusions.

  • Erica
    17 August 2016 at 12:13 am
    Leave a Reply

    This is awesome! One thing though, with privacy being a major concern, data needs to be hosted in Australia. Does anyone know if there is something like AWS available hosted in Australia?

    • Trevor Long
      17 August 2016 at 12:20 am
      Leave a Reply

      Yep, AWS have Australian servers. But in reality – it’s a bigger question of being a bit more innovative, doing better planning and not going the “safe” old-school route with things. Governments must be agile if they are asking us to be?

      • Mark D
        25 August 2016 at 10:12 am
        Leave a Reply

        So you think that outsourcing is innovation; how sad for you.

        • Trevor Long
          25 August 2016 at 11:12 pm
          Leave a Reply

          You should join a Union. How sad for you:)

    • David
      17 August 2016 at 9:27 am
      Leave a Reply

      An interesting idea, equally interesting calls to make!

      Off the top of my head the first major stumbling block to achieving the purported efficiency and affordability of the solution is a little thing called the law.

      Rightly so: the legislation within both the Privacy and National Archives Acts categorically prevents private citizen data from being hosted (on or through) data centers that aren’t weapons-grade secured.
      One of the key features of cloud data centers is that they’re shared, when one user is not using the processing power another Is, the concept of having someone else running code on the same metal that the census was running on is a risk no security expert in their right mind would be willing to accept.

      The effort required to actually build a scale-able architecture of those proportions, not forgetting the safeguards and measures for redundancy that meed to be in place AND ensuring reliable availability of qualified, proficient operational support once it goes live. Know of a team of specialists, trained in national security protocols, that amasses a diversity of knowledge flexible enough required to coordinate the variety of architectures, hardware, and procedures likely to be encountered?

      Facebook would shit itself. Reckon if these kids knew wifi is metered in jail; they would too.

    • Thurstan Hethorn
      18 August 2016 at 7:16 pm
      Leave a Reply

      There is AWS served out of Sydney. However for their redundancy they backup across their various sites. Also as pointed out, it is an American company. It is unfortunately not currently an option allowed within government services from the short stint of work I’ve been involved in their innovation department.

      Also according to the IT guy I was working with on a government project, IE7 still needs to be supported. This makes me cry and I don’t even have to do the coding. It does however add much more work to a project.

      The government is trying to be more agile, but it is going to be a long journey. This census has it quite a bit easier then other services though, because it doesn’t have to tie in or depend on other services that are likely not yet setup to be nimble.

      Even with all the loops and jumps and departments and people such endeavours have to pass through, it still is disappointing what appears to be a webform with a secure database has cost so much and delivered so terribly. This is before going into the appalling communication of the state of the website and dubious claims of being DDOSed.

  • Jarred
    17 August 2016 at 12:20 am
    Leave a Reply

    While there was obviously a gross technological mishandling of the census by IBM (and I can’t understand those blaming the Gov btw, what do you want, Turnbull up late provisioning autoscaling groups?), I can understand why they would want to keep the data and servers in house. There are huge data security concerns for a service that sensitive beyond just having to scale. Sounds like that’s what they were prioritising?

  • Camm
    17 August 2016 at 8:10 am
    Leave a Reply

    Fyi; Government ISM requirements preclude the type of personal data being collected by the census from being hosted in Public Clouds.

  • DavidPage
    17 August 2016 at 9:30 am
    Leave a Reply

    An interesting idea, equally interesting calls to make!
    Off the top of my head the first major stumbling block to achieving the purported efficiency and affordability of the solution is a little thing called the law.
    Rightly so: the legislation within both the Privacy and National Archives Acts categorically prevents private citizen data from being hosted (on or through) data centers that aren’t weapons-grade secured.
    One of the key features of cloud data centers is that they’re shared, when one user is not using the processing power another Is, the concept of having someone else running code on the same metal that the census was running on is a risk no security expert in their right mind would be willing to accept.
    The effort required to actually build a scale-able architecture of those proportions, not forgetting the safeguards and measures for redundancy that meed to be in place AND ensuring reliable availability of qualified, proficient operational support once it goes live. Know of a team of specialists, trained in national security protocols, that amasses a diversity of knowledge flexible enough required to coordinate the variety of architectures, hardware, and procedures likely to be encountered?

    MSM would shit itself. I reckon if they knew wifi is metered in jail; these kids would too lol

  • Joe
    17 August 2016 at 11:22 am
    Leave a Reply

    Mate, as a guy who looks after data for a living. I don’t think you know what you’re talking about. If insurance and finance companies don’t want to make the leap yet because of APP concerns then govt (especially census that now stores individual details) will not touch cloud with a barge pole. In your rush to praise students, don’t trivialise an important concern.

    • Trevor Long
      17 August 2016 at 11:24 am
      Leave a Reply

      APprecaite your concern, but you’re missing the bigger point here – these guys performed a load testing to four times the capacity the ABS assumed they would need and paid half-a-million dollars for! Insane

  • Gonzo
    17 August 2016 at 4:29 pm
    Leave a Reply

    What a silly article. As previous commenters have pointed out, hackathons are just proof-of-concepts and never include any of the “hard” things to get right, which is why hackathons work.

    Government legislation doesn’t allow census data to be stored on overseas owned or hosted infrastructure, which automatically excludes AWS and hence, this solution.

    Although IBM have obviously done an awful job, claiming that what is really just a glorified form submission page is any kind of replacement for the census is hogwash.

    • Trevor Long
      17 August 2016 at 4:50 pm
      Leave a Reply

      I think it’s all summed up by “Although IBM have obviously done an awful job” – that’s what we’re trying to address here – there are many more ways to cut this one, and the ABS failed to innovate in that sense.

  • Ben Finney
    17 August 2016 at 5:34 pm
    Leave a Reply

    Yes, someone else’s computer could do the work. (Remember folks, there is no cloud out there; it all resolves to a real computer controlled by someone else.)

    If the ABS got some other party to handle confidential information – mandated from every Australian resident by law – how would they ensure data sovereignty? How much would it cost to ensure that, compared to keeping the computers and the data where they can be verified secure?

    http://eftm.com.au/2016/08/how-two-uni-students-built-a-better-census-site-in-just-54-hours-for-500-30752

    • Trevor Long
      17 August 2016 at 8:13 pm
      Leave a Reply

      How about we spend 5 years building that or making it safe/secure – ready for the next census?

      • Richard
        17 August 2016 at 11:49 pm
        Leave a Reply

        Did they spend 5yrs building it? wasn’t the original T.Abbott sign work request from late 2014?

  • IA
    18 August 2016 at 9:17 am
    Leave a Reply

    They may have done some interesting architectural stuff on the backend, but as others have mentioned the website itself is terrible for what I imagine the main goals would be, security, scalability & data validity. No form validation, at all, like load page press submit no problems, come on now. HTML & CSS code not optimised with lots of commented out code blocks? Pulling external JS library? There was probably 100+ requirements for a project like this, these guys provided a non-workable solution to one or two (the cloud) and we have an article claiming it is great.

    Trevor if as you say you are trying to address IBM did a terrible job, do that, this website does nothing advance that argument at all.

    Perhaps fix the label on the email field to be email instead of name also on this website first though.

  • D
    18 August 2016 at 9:23 am
    Leave a Reply

    “…was load tested to 4 million page views per hour. And 10,000 submissions per second – insane numbers.”

    Surely the author is aware that DDOS attacks don’t use HTTP/HTTPS, load testing is quite irrelevent here. Apparently AWS would have held up “fine” to a DDOS attack though. Very scientific insight there. No worries mate, she’ll be right.

    The question of “Why didn’t ABS have stronger DDOS protection in place?” can and should be asked, but the idea that the entire census form could be replaced with a basic HTML form on AWS is rubbish.

  • IT DUDE
    18 August 2016 at 10:27 am
    Leave a Reply

    The government has way too many regulations and legacy systems to make simple elegant solutions possible. The consultants had their hands tied and were unable to deliver an optimal solution. On top of that, add the usual laziness, politics and incompetence, you have the recipe for a disaster.

    The census was just a simple form capturing a number of text fields, with some conditional logic that can live on the client side, it does not require any sessions since anyone can re-fill that form in less than 5 minutes. Scalability was the only issue here, using on-premise infrastructure is the main issue, the cloud had everything needed for this to work, but privacy acts and shit loads of other regulations prevented that from being part of the equation.

  • Ash
    18 August 2016 at 11:38 am
    Leave a Reply

    2 students take 54 hours to build a dumb static web form, and host it on public cloud infrastructure. This is not impressive at all. This isn’t even 0.1% of what IBM did for the census project, and it took them 54 hours to do what an average web developer could do in 4 hours. Also, 2×54 hours for $500 comes to $4.62/hour, so their estimates are at best wrong.

  • Carly
    23 August 2016 at 7:55 pm
    Leave a Reply

    This is a great example of everybody who has a job, probably shouldn’t be in it. Move over and stop playing old mens club and let some real talent in.

  • Poopoo
    29 September 2016 at 10:26 am
    Leave a Reply

    hahahahahahahahahahahahahahahahaha, two boys with no experience in providing secure web forms managed to win a $500 crappy tablet. There is no validation either client side or server side, and the load testing claims are just claims you could probably load test the census site using the same method and it would come out in front. It is a big claim to say they have beaten the ABS, IBM and the Federal Government at something that could not even match the complexity that involves protecting valid user data.
    Good luck to the boys, but I think you have drawn a long bow and this article is fantasy- your fantasy.

  • Leave a Reply

    *

    *