forumei: Introducing Clusto

When you start building a website, there is very little to keep track of. You get a server, install some software, write some code, and release. Nowadays, depending on what you’re doing, you could probably skip some of those steps. You may be lucky enough to grow your traffic, need more servers, and even need to move into a datacenter and start buying equipment by the rack.

There are a lot of tools out there to help you manage your systems. They range from basics such as debian packaging of your software to configuration management tools like puppet that help you keep your servers’ software stacks configured correctly. One of the less glamorous tools that you’ll need is the “infrastructure management system.” This is a class of software that is known by many names, all with varying degrees of functionality. They can range from a spreadsheet with a list of your servers, to a host database with records of your physical inventory, to big enterprise systems that manage whole datacenters.

What seems to happen in most organizations is that they build a custom system that keeps track of their inventory in a database with a custom schema, specific to them, and write tools that know how to interact with that database in a non-portable way. Everyone does it differently which means most organizations end up reinventing the wheel.

Here at Digg, we tried various approaches for keeping track of our infrastructure but were never completely satisfied. We wanted something that, at its core, was a database of all our equipment, what their configurations were, where they were physically located, and how they were physically connected. If we had such a database and a standard way to interact with devices in our infrastructure, we could ask to connect to console on a device and the system would figure out what type of console server it was connected to, what port, and then ask the console server to open a session. We wouldn’t need to care what brand of device we were working with nor the details of how they were connected. There are some systems which are known for handling virtualization with a similar sort of finesse. We wanted something more flexible that could work with real hardware and was systems engineer friendly.

Clusto is our attempt at building such a system. It turns out to be a somewhat more complicated problem than one might expect. After several iterations we think we have a system that demonstrates the principle well and could be generally useful.

At its core, clusto is a library on top of a database (sqlite, mysql, and postgresql have been tested). There are two core datastructures. Entites are identified by a ‘name’, categorized by a ‘clusto_type’, and are associated with a specific ‘clusto_driver’. These entities can have Attributes which can have a ‘key’, ‘number’, ‘subkey’, and ‘value’. This allows you to have an Entity named “server1” with an Attribute key=“hd” number=“1” subkey=“size” value=1000.

On top of those we have a Driver class that provides an interface for interacting with those data structures (almost everything else inherits from Driver). We’ve written drivers for managing most physical things in our infrastructure like racks, servers, and powerstrips, for resources like ipaddresses, and for organizing primitives such as Pools. The drivers are specially written for the thing they manage, so they encapsulate esoteric details like the snmp oid needed to instruct a powerstrip to powercycle a particular port.

Since all the data is in clusto, you can interact with it in fun ways. Query it for information such as how many CPUs are in a particular rack or powercycle only the servers (not the network switch or console server) in a bunch of racks with one command. We’ve built several useful tools on this foundation. Our server provisioning system uses clusto to reboot a server, allocate IPs, PXE boot and FAI a base installation, record important attributes, and prepare it for final configuration with puppet. Then, puppet queries clusto for information about how a node should be configured. This integration with clusto has allowed us to get racks of servers ready for use in minutes, and build brand new clusters, fully configured, with a few simple commands.

Here is a quick demonstration of some simple clusto interactions. First, let’s initialize the db:

import clusto

## connect the db

clusto.connect('sqlite:///example.db')

## initialize a new db

clusto.init_clusto()

Now, let’s build a virtual datacenter with equipment:

dc = clusto.drivers.EquinixDatacenter('dc1')

# a couple resource managers to help with allocation

servernames = clusto.drivers.SimpleEntityNameManager('servernames', basename='server', digits=4)

ipman = clusto.drivers.IPManager('subnet-10.0.0.1', netmask='255.255.255.0', baseip='10.0.0.1')

## build the racks, with servers

for r in range(3):

    rack = clusto.drivers.APCRack('rack%d' % r)

    dc.insert(rack)

    pwr = clusto.drivers.PowerTowerXM('%s-%03d-pwr' % (dc.name, r))

    sw = clusto.drivers.Cisco2960('%s-%03d-sw' % (dc.name, r))

    rack.insert(pwr, [39,40]) # put the power at the top of the rack

    rack.insert(sw, [38]) # put the switch under the rack

    for s in range(1,21):

        # install the servers starting from the bottom of the rack and connect

        # them to the respective switch and power ports port

        server = servernames.allocate(clusto.drivers.BasicServer)

        ipman.allocate(server)

        rack.insert(server, [s])

        # connect eth port 1 on the server to switch port s

        server.connect_ports('nic-eth', 1, sw, s)

        # connect power port 1 on the server to power port s on the powerstrip

        server.connect_ports('pwr-nema-5', 1, pwr, s)

        server.set_attr(key='description', value='this is server numer %d in rack %s' % (s,rack.name))

You’ve made a datacenter with 3 racks each with a power strip, a switch, and 20 servers. Let’s query it to find out what server is connected to port 12 on the switch in the second rack.

rack = clusto.get_by_name('rack2')

sw = rack.contents(clusto_types=[clusto.drivers.Cisco2960]).pop()

server = sw.get_connected('nic-eth', 12)

print "Server connected to port 12 on the switch in the second rack:", server

print "\nThe server's attributes:\n"

for a in server.attrs(ignore_hidden=False):

    print "%s: %s" % (a.key, a.value)

Result:

Server connected to port 12 on the switch in the second rack: BasicServer(name=server0052, type=server, driver=basicserver)

The server's attributes:

ip: -1979711436

ip: IPManager(name=subnet-10.0.0.1, type=resourcemanager, driver=ipmanager)

ip: 10.0.0.52

port-nic-eth: Cisco2960(name=dc1-002-sw, type=networkswitch, driver=cisco2960)

port-nic-eth: 12

port-pwr-nema-5: PowerTowerXM(name=dc1-002-pwr, type=powerstrip, driver=powertowerxm)

port-pwr-nema-5: 12

Now, say I want to know what rack has the server with ip 10.0.0.7:

server = ipman.get_devices('10.0.0.7').pop()

rack=server.parents(clusto_types=[clusto.drivers.BasicRack], search_parents=True).pop()

print "Rack which contains the server with ip 10.0.0.7:", rack

Result:

Rack which contains the server with ip 10.0.0.7:APCRack(name=rack0, type=rack, driver=apcrack)

And finally, how many servers are in dc1?

servercount = len(dc.contents(clusto_types=['server'], search_children=True))

print "Number of servers in dc1:", servercount

Result:

Number of servers in dc1: 60

This is just a taste of what’s possible. There are many more useful features like versioning, pools, more ways to query, command line utilities, a dhcp server, and a web interface just to name a few. The clusto model could be a useful approach for taming infrastructures. The common interface it exposes could hopefully help facilitate the building and sharing of tools for managing different hardware and interfacing with more software systems. There are lots of improvements to be made but we’re already finding it useful and thought it was time to share. We’ve started a google group for the project, the code is up on github with somedocumentation, and you can aptitude install clusto from our debian repo.

forumei

Jumat, 14 Mei 2010

Introducing Clusto

Tidak ada komentar: