Wednesday, June 21, 2006

Blogging interfaces

Google/Blogger have good doco on-line:

And there are PERL modules on CPAN.

O'Reilly has a discourse:
[and many others]

I'd like to find some free application to edit the XML or post the blog locally.
O'Reilly have a thing that publishes to "iDisk" on .MAC - not blogger and atom.

Here's some command line posting stuff. Uses "curl".

What BLOGS do I have?
curl -u "$Auth" -o blog.1

Which will give for each blog:
<link href="" rel="" title="stevej-in-oz" type="application/atom+xml"/>
<link href="" rel="service.feed" title="stevej-in-oz" type="application/atom+xml"/>
<link href="" rel="alternate" title="stevej-in-oz" type="text/html"/>

Download a blog
Get the "" tag.
curl -u "$Auth" -o blog.2

Post an entry
The hard bit here is the formating of the XML..

curl -u "$Auth" -o blog.4 -H "Content-type: application/xml" -d @blog-post.1

some useful searches
grep service.feed blog.1
grep service.edit blog.1
grep blog.[12]
grep service.delete blog.[12]

Saturday, June 17, 2006

The Data Dungeon

This idea came from reading "I, Cringely" of 17 Nov 2005. Cringely says Google is creating a "datacenter in a box" - shipping container, really - "5000 Opteron processors and 3.5 petabytes of disk". That's pretty impressive.

People have been buying servers and building datacenters for years - why should this be exciting? Because it has the potential to lower the cost of the whole datacenter radically - without doing any calculations, 5 or 10 times.

I thought how I'd do it - no fans (they break) means liquid cooling, no UPS - direct DC, single A/C unit, no walk spaces, no cases needed for equipment unless it's for cooling, RFI or containment. OH&S doesn't occur when it's working - it's sealed.

And you throw away the key. It's the next logical move from "lights out" or "dark datacenters"...
[You may even weld the doors shut.]

A reasonable technology and physical life, without maintenance, would be 3-4 years.

And people like SUN, Dell, IBM and HP could make these things - either sell or lease. And being the owners, could

Google have a very special workload - it scales linearly and generally is short transactions that can be rerun.

Normal commercial workloads are things like:
- webserver (Transaction based, restartable, load-balancer friendly)
- database (long-connection time, persistent, need clustering for high-reliability/availability)
- filer/file server (more like a database)
- email - client or server. Both need reliable data storage, but can take restarts.
- and probably way more...

The whole point of a "Data Dungeon" is replicating at the systems level, not the component level...
You don't need hot-swap power supplies, dual-NICs yadda-yadda-yadda if you have two complete systems that hot-swao.
Commodity hardware is *cheap*. You have to be inventive with your software/systems to design around break-able parts.

And all parts don't have to be the same - you'd want some really low-power fanless CPU's for some types of service, and enough top-end high-power CPU's in the mix for those times when too much grunt is not enough...
It's not going to be a box full of just the one thing...

So a "Data Dungeon" - would you ever just have ONE? Nope - the breakable design dictates at least two... Which you can stack in a car-space out the back [shipping containers, rememer?]. And when it's time to upgrade, wheel in another one or two, mirror the data, migrate the persistent processes and take away the old ones - all done live in prime-time...

Part of the scheme is running everything in Virtual Machines: Only one service to a virtual machine (ebserver, email, DB, ...)
It's easy to migragte a service onto a different physical processor - if you have load or servicability problems.
[VMware have some neat new Enterprise tools to do this now.]

And with Mac on Intel, running VM's means you get to run all the major commercial apps:
- all flavours of Windows deskop - via VNC or Citrix remote client to host legacy Apps.
- Windows server
- Mac OS/X
- z/OS [IBM mainframe]
- Solaris, BSD, and Linux [for those who need a Unix]

The Challenge

From: NeilG
Subject: The Data Dungeon - Notes from the Lab

As a side-effect of our conversation last night, I mentioned you should start blogging this idea ("throw away the key data center") as a way of promoting it. This is my reminder.

I'm pretty sure you said it wasn't a patent item, but you weren't sure how to convince people it was a good idea. Blog it! :)

In that same vein, I was trying to think how to differentiate your blog. Most blogs are just text, and very unappealing visually (to me). I think a paradigm that would work well for you is, a "lab notes" format. We were talking about keeping a notebook as part of the chem and eng and comp sci disciplines. Take that same format (you already have the base text) and blog it with visuals.
E.g., This link

Personally, I think it's cool if you can look inside someone's mind as ideas are being wrought and I think a lot of geeks would agree. That's also the antithesis of the western philosophical tradition, and a good match for your style.