I didn’t want to touch upon anything official in this travelogue but this is one episode that I can’t skip – and I’m sure a lot of IT techies would relate to this easily.
Ever since I took over the onsite support for the application there was some issue or the other every night in the batch jobs – not because of anything I caused but I guess destiny wanted me to be present in the middle of a storm. Or perhaps the application didn’t like the idea of Sini deserting it on vacation. Except for the Saturday and Sunday the three working days so far had problems every night (thankfully we didn’t have programs running on the weekend). On Thursday and Friday Sini was there with me but Monday I was all alone in the battlefield.
It all began with a call at around 3:30am in the night; I had conveniently placed Sini’s mobile phone at night within arm’s reach and in the dark was able to pick it up in one go. The calls come from production jobs monitoring group – if any job fails they would ring up the respective application support person to check on the issue. Our application was fairly stable and before my arrival there was relatively less night support required. But all that would change now. Wreet didn’t stir from his slumber even when I switched on the light for a short while.
It took me a while to get accustomed to the voice at the other end – every night it is a different set of people who monitor the jobs and each person had a different accent. This time it wasn’t a failure – a job was running for too long – for 4 hours or so and they were worried about it. As usual I told them that I’d login within a few minutes to check on the job. We had our own internal messenger application – everyone working in the company used it – something like a private Yahoo messenger service; ah the pains of modern technology - the mobile phone, the laptop, the messenger, virtual private networks, the internet – ah the bane of modern technology; the world was becoming a very small place to live in; I’m sure most people in the IT field would have cursed these technological advances at some point in their career!
There was nothing much I could figure out from the job statistics – the job was running but instead of having completed in 30 minutes it was running for 4 hours tonight! There was no way to determine how long it would take to complete and so we were kind of stuck. The production support team was quite helpful – they did their best to pull up whoever I needed online; database support, database administrator etc. But till 5am we had no clues – the job was running – it wasn’t stuck; it was reading records from a database was the best clue we got. Apparently all my worries began because for the first time our application had a few million records in three or four tables – they were loaded just a few days or so before my arrival and the application had never been tested on large amount of data till now. The main database admin wasn’t available early in the morning – we only had an assistant on call. He checked on a few things but we didn’t figure out the issue. Finally we decided to cancel the job; the call was mine – I didn’t really have any choice – it was just a hope. I checked some intermediate files and came close enough to identify the program where it was stuck. I asked production support to restart from that particular program – hoping that perhaps it was just a case of database being locked during the earlier run. What a wonderful start to the day!
And the worst was yet to come…what do you do when you are at your wits' end?