I have been out of the programming circuit for a few years and have been looking at getting back into it. My traditional programming style is an ssh window into my server and all my editing takes place on a development server, in vi.
Recently, I’ve been trying to decent work out a way to determine how the world sees your connectivity from within your network. Essentially, I wanted to simulate accessing one of my locally connected machines from the Internet. Typically, you have to subscribe to a third party service to perform this service for you. Coincidentally, I have been reading up on Google App Engine and saw the potential in using GAE for my purpose. I could envision writing a monitoring system that runs entirely on the GAE. Unfortunately, I had no programming experience in Python or Java. I did see that PHP has been ported to Java and someone got PHP running on GAE. The possibility of either creating a new monitoring system in PHP (or modifying an existing PHP-based monitoring system) entered my mind.
I decided that rather than stay with PHP, I would use this project as a method to learn Python. I started digging into Python resources and contemplating how I would want my monitoring system to work. Ultimately though, I decided I didn’t want to create a brand-new monitoring system (even a basic one) when existing ones such as Nagios and Zabbix do perfectly well. In my research, I found a project called mirrorrrthat used GAE as a web-proxy.
This solution was immediately obvious. My existing monitoring system (Zabbix) has support for fetching web pages. I could place a file with the word “OK” on my local web server and then fetch it through GAE. I could even determine through the returned page whether my server was down or if GAE was down.
I set to work testing out the mirrorrr code under my own account. The major issue I observed is that mirrorrr is configured to cache pages, meaning that when I changed my OK to FAIL, mirrorrr never updated the page. In the closed-source world, that is the end of the story. However, since this is an open-source project, this was an opportunity.
I’ve been wanting to get back into programming, and I when I start back, I want to be familiar with using an IDE (namely Eclipse). In preparation for creating a monitoring system, I had setup a development station in (gasp) Windows Server 2003. I connect to this via remote desktop and generally leave Eclipse running 24x7. I had also gone through the steps of installing the PyDev plugin, the GAE plugins, and SVN plugins.
The Process
I downloaded a copy of the code using SVN checkout and set to work editing the mirror.py file to disable CACHE. I call this the “dive in and learn to swim later” process. Here, I could make changes to existing code and test them out immediately. In fact, the SDK for GAE works as a sort of mini-server. Once I run the code within the SDK, any change I make to the source affects the running instance.
I was able to read through and alter the code to allow me to switch between caching and non-caching. I ran into an issue with their “recent urls” feature. This shows the last 5 urls you have visited. When you are not caching your data, this never gets sets and starts throwing errors. I realized I would have to improve that section of the code before I could truly implement a configurable “enable cache” option.
At this point, I backed out of my file and considered my options. I wanted to make two distinct changes to the source, one of which requires the other. The author of the project hasn’t maid a change to his(her?) code since December of 2008. However, I did see recent entries in the Wiki, indicating this wasn’t an abandoned project. I realized that to truly make my changes worthwhile, I should try and get them included back in the upstream. To do so, I should submit each change separately. That required more tracking than I had been doing.
So, I took care of another todo list item. I went ahead and setup an “official” repository server for YourTech, checked out a fresh copy of mirrorrr, and then imported it into mine. Now I could work. I imported the project from my repo server into Eclipse and started recreating my work. First I added added a feature to disable the recent urls. At the same time, I made an improvement by moving a chunk of code into a self-contained method (which is apparently what python calls functions, as near I can tell at this juncture). Once this was committed, I went ahead and proceeded with recreating my work on disabling the cache.
Once I was done, you could enable or disable each feature separately. However, if you disabled the cache but left recent urls enabled, your recent urls would never update. On the flip side, if recent urls were recorded before disabling the cache, then you could see the most recent urls before caching was disabled. Some person may want that feature – they could start up the mirrorrr, visit several links, then disable the cache, preserving those links on the main page forever.
At this point, the only remaining task was to submit my changes to the project maintainer. I opened issues 6 & 7 and now I await response.
Conclusion
The more I use Eclipse, the more I like it. The ability to perform every step of the process in one program is extremely useful. Steps like comparing history or checking out a specific version is much easier to grasp than when you are tooling around the command line. In the shell, I would typically have most of my files open as a background task. Reverting a file from subversion required switching back to the file, closing it, then reverting. In Eclipse, it’s a right-click operation, regardless of whether the file is open or not.
Python’s syntax is a bit weird coming from a Perl and PHP background, but it’s learn-able. As I plan to make several more improvements to mirrorrr, I hope to become proficient in this language as well. However, I may be picking up a Perl project in the near future using the Mojo toolkit, so everything is up in the air.