Preparing For My Interviews Part 2: MySQL and Python
By Mark Nielsen
Preparing For My Interviews Part 2: MySQL and Python
By Mark Nielsen
- Introduction
- Mysql Master/Slave and Clusters
- The Python script and Python Module MySQL.py.
- Executing the Python script.
- Some commands to execute.
- Next Month: Stored Procedures in MySQL (5.0?)
- Conclusion
Introduction
This article is actually a lie. I am not preparing for interviews anymore.
I started this article at the end of October, but then I got a 2 month
contract right after that. Still, it's the thought that counts.
This article is to help you setup MySQL Master/Slave/Cluster
combinations on one computer. Why would that be useful? Well, for
starters, there isn't an abundance of Master/Slave/Cluster documentation.
The Cluster technology is really new. Second, MySQL is a very very very hot
database. There aren't too many good MySQL DBAs out there. If you want to
secure your job in the future, learn MySQL. The Master/Slave and Cluster
technologies are a must. If you don't know anything about the Master/Slave
or Clusters, you can pretty much forget getting hired. In every MySQL
interview (I had like 10 companies interviewing me in a six week period and
half of them used MySQL for something) the Master/Slave questions popped up
and topics of the Clustering technology came up as well. Third, it is a
pain to setup Master/Slave or Cluster environments by hand. If my script
works on one of your computers, copy over the software, copy over the
config files, make slight changes to the config files, and you can get a
real multi-computer environment setup in minutes.
The second purpose of this article is to show you how to use Python to
write a simple application (compiling and installing MySQL). I tried to
include a lot of stuff in the Python code which is common to most Python
scripts/modules. Also, I am thinking about turning the Python code into a
package in the Vaults of
Parnassus - just for kicks, and so I can say that I have some sort of
Python code published.
So, why did I use the programming language Python to execute all the commands? I
love Python, it is object oriented (from the ground up), has good exception
handling, has been compilable for a long time, it is hard to write ugly
Python code, it is easy to understand other people's code, there is only one
way to do things (usually), it's easy to manipulate, and it is a general tool
from web programming to Unix scripting to database programming to GUI
programming to mathematical programming (my favorite). There are so many
reasons to love Python if you are a true object-oriented programmer. The
other reason why I try to use Python for all personal projects is to
convince people where I work (or will work) that Python rocks and should be
used wherever appropriate. The sad thing is, most managers don't want to
use Python because of the lack of people who can write good Python code, so
it is my duty to convince them otherwise. Update: I got hired at Google,
which uses a lot of Python so now I am a happy camper!
Why compile MySQL? Why not just use the RPMs? I don't think clustering
is built into some of the RPMs. Also, I always like to compile and install
software myself. If you can't compile it, odds are, there might be other
problems. If you do decide to use RPMs, and you decide to use an RPM
server, please use YUM and do not use
commercial RPM servers. All the commercial RPM servers I have seen (just
one, you know who) are way overpriced and are geared towards the executive
staff or people who want to buy support so that they can blame someone if
something goes wrong. The sales staff are really good at making you believe
you can jump to the moon, when really, the software they present only
manages the installation of RPMs. Any 1st year programmer could write a web
interface to do the same thing. By using YUM (it is easy, simple, and free
of restrictions), you empower yourself and you will help bring down the
costs of the the overpriced commercial RPM services. If the commercial RPM
services weren't so overpriced (they should be like $25 per computer max)
and didn't have so many stupid restrictions (like installing
thebind RPM requires a more expensive license), I wouldn't
mind, but they are getting away with doing so little, it is ridiculous.
Politics and the suits have gotten in the way of good technology at some of
these companies.
One more thing, I bought a 2-gig RAM 64 bit AMD CPU with Serial ATA
hard drives. I bought this hardware configuration for the simple purpose of
using a 64 bit operating system. Why? Because many companies are using 64
bit AMD cpus for their databases and other things. Just at work the other
day, I mentioned I got a MySQL cluster to work on my home computer, so I as
asked to setup a test cluster at work. See how valuable it is to stay ahead
of the market?
The nice thing about the 64 bit AMD CPU is that is will also support
the 32bit operating systems. Thus, I was able to take the safe route by
installing a 32 bit Linux OS before I risked using the 64 bit versions
(which had a lot of problems in the past). Plus, don't kill me for saying
it, but I still wanted to play some Windows games that weren't available
for Linux yet. I am close to the point where I never need to use Windows
again. If gaming companies would always make Linux versions of their
products, I would never have use Windows for anything. That would really be
nice.
Mysql Master/Slave and Clusters
What is
MySQL? Well, if you mean the
MySQL database server, it
is an SQL database server available from
mysql.com. It is comparable to
PostgreSQL and Oracle in most ways.
MySQL is one of the two most popular Open Source database servers out there
(PostgreSQL being the other).
What are MySQL
Master/Slaves? The Master MySQL server lets people read and write data
to it. It copies all of its data to the MySQL slave computers. Thus, you
end up with many database servers with the same data. This can be useful
for load-balancing your webservers against many database servers or
performing a failover in case the Master dies (you shutoff the Master and
make one of the Slaves the new Master). Normally, the Slave computers are
read-only. Since most websites have a 4-to-1 ratio of reads to writes,
having many read-only MySQL slaves can be very useful. How it is useful?
Your website can handle more data and use more database connections spread
over multiple servers. All the writes still have to be done to the Master,
but the website can choose which Slave it wants to read data from.
Separating the write connections from the read connections can speed up
your website a lot.
What is MySQL
Clustering? MySQL Clustering lets you put multiple computers together
which are all the same and have all the same data. You can connect to any
computer in the cluster and perform read/write operations which are
immediately available to the other mirrors. The main different between
Clustering and Master/Slaves is that each computer in a Cluster can be
written to while in a Master/Slave configuration you are only suppose to
write to the Master. Also, in Clustering, when you write data to one
machine, the computer will not respond back with an "OK" status until the
data has been copied to all the other computers. With the Master/Slave
configuration, the data is not necessarily copied to the Slaves
immediately. Effectively, you can view the cluster as one single
entity/database. The nice thing is, if one computer is the cluster
crashes, it doesn't affect the other computers or the data in the cluster.
This behavior doesn't exist in the Master/Slave replication. If the Master
goes down, the replication stops.
Clustering is a little more complicated, but it has huge advantages and
will most likely be used a lot by many companies. It is fairly new
technology, so in my opinion, it is good time to start testing it. For
small websites which love to be on the bleeding edge of technology and you
don't mind being risky, go ahead and use MySQL Clustering. I am a little
cautious about using it for heavy performance database needs, but I am
anxious to at least try! You should be warned that there are a lot of
limitations with MySQL Clustering. For example, your database can only be
as big as the free RAM you have. Also, I have had a lot of problems getting
more than just a two storage node cluster working.
The Python script and Python Module MySQL.py
The Python Script "Compile_MySQL.py" is meant to access the MySQL.py
module to install, configure, and get MySQL running. I tried to make
it as simple as possible so that a lot the options you want can be
specified at the command line.
The MySQL.py module currently only has one class called "Installer"
which is mostly finished. It has other classes I am working on, but
nothing worth talking about yet. I included a lot of stuff in the module
like:
- I broke down the main modules to approximate the steps you would
use to install MySQL. In addition, I separated out the compiling from
the configuring because I wanted you to be able to skip the compiling
if you already installed MySQL once.
- Use of the try/except statements.
- The first string of the method(s) is used for automatic documentation
extraction.
- The concept of classes.
- The init and del methods which are executed at creation and destruction
of an object.
- Other internal methods to handle printing an object or when you try
to compare this object against other objects.
- Escaping to shell to execute code, get its return status, and output
data.
- Regular expressions and using them repeatedly. It's only necessary to
create a regular expression once, so that you don't waste resources
recreating regular expressions.
- Storing variables in an object.
- Line command arguments are passed.
- The script only creates one set of MySQL binaries. It uses different
directories and ports for each instance of the MySQL server.
Personally, I think the module could use a lot of work. I see repetitive
code that could be converted into methods, there need to be more
command line options, and there needs to be a place to save the configuration
for later runs (using xml) if we want to use this as an admin tool.
Here is sample script to use my MySQL module. Note, if you already ran
this script and got mysql working, and you want to reconfigure it, then
comment out "M_Obj.Execute_Compile_Script()" to reconfigure mysql. I tried
to comment each below in the script.
#!/usr/bin/python
### Just a bunch of standard modules I load.
import urllib2, urllib, string, re, os, struct
import base64, string, gzip, sys, time, commands
import getopt
### The module I created.
import MySQL
## Initialize our installation object.
M_Obj = MySQL.Installer()
#--------------------------------------------------------
### Build and install the mysql binaries. These binaries will be common
### with all the mysql services.
M_Obj.Write_Compile_Script() ### This creates out bash compile script.
M_Obj.Execute_Compile_Script() ### This executes our bash script.
### This creates one set of binaries for
### all mysql instances.
M_Obj.Stop_Instances() ### In case there are any running.
M_Obj.Setup_Instances() ### Setup and initialize the databases.
M_Obj.Start_Instances() ### Start the databases.
Executing the Python script
Download these files:
- MySQL.py and save it as MySQL.py.
- Compile_MySQL.py and save it as Compile_MySQL.py.
- Config files. Save all the config
files in this directory (or grab this tarball containing all of them) into a
directory called "Config" where you execute Compile_MySQL.py.
Then execute this command:
python Compile_MySQL.py
This will download and install MySQL. It will make a log file called
"/tmp/mysql_install.log". If it doesn't download MySQL, download
mysql-4.1.7.tar.gz manually from mysql.com and save it in the directory
"/usr/local/src/mysql_compile".
NOTE: If you want to specify another database server,
then execute this, substituting the appropriate URL:
python Compile_MySQL.py -d http://www.signal42.com/mirrors/mysql/Downloads/MySQL-5.0/mysql-5.0.2-alpha.tar.gz
Some commands to run
So, now we got the database server installed. Let us do some stuff
to verify the Master/Slaves are working as well as the MySQL Cluster.
I made things really easy for you (assuming the installation was good).
I have made a bunch of scripts for you.
Here are a bunch of scripts in the home directory where mysql got
installed.
| Script | Usage | Purpose |
| bash_aliases | source bash_aliases |
This setups aliases to connect to each service. The aliases connect to
the master service, slave services, ndb_mgm, and each cluster mysqld
service. |
| Start | ./Start | Starts the master, slaves, and
cluster. |
| Stop | ./Stop |
Stops the master, slaves, and cluster. |
The scripts listed below are created in the "scripts" directory
where mysql got installed.
If you did a "source bash_aliases", these commands will be in the bash
path. You can execute these scripts as many times as you like.
| Script | Purpose |
| Master_Status.bash |
Gets the status of the master service. |
| Slave_Status.bash |
Gets the status of the slave services. |
| Master_Slave_Status.bash |
Gets the status of the master and slaves. |
| Master_Slave_Test.bash |
Inserts data and then the master and slaves perform the same
sql query which should show the same results. |
| Cluster_Status.bash |
Gets the status of the cluster |
| Cluster_Test.bash |
Inserts data into the cluster. It shuts down each storage node
and sees if each mysqld server can still access the data. |
Try to learn what the scripts do step by step. If you can fully understand
what those scripts are doing, then you know at least the basics about how to
manage the mysql services.
Follow these steps:
cd /usr/local/mysql-cluster ### Change to the mysql directory.
source bash_aliases ### Load some aliases into the bash shell.
./Start ### Start mysql master, slaves, cluster.
Cluster_Status.bash ### Get the status of the cluster
Master_Status.bash ### Get the status of the master service.
Slave_Status.bash ### Get the status of the slave service.
Master_Slave_Test.bash ### Test the master/slave services.
Cluster_Test.bash ### Test the cluster.
Next Month
I am probably going to use MySQL 5.0 at home from now on. So, I am probably
not going to test MySQL 4.1.x anymore. Saying that, possible next months topics
include:
- MySQL failover for Master/Slave cluster. The master has gone bad and
we want to make a slave the new master.
There are lots of issue to consider when
doing this, like data corruption.
- Stored procedures.
- More cluster configurations.
- Scripts to check if your MySQL Master/Slave/Cluster computers
are running smoothly.
- Maybe check out all the new tools you can download from MySQL?
Conclusion
The Python module was sort of an overkill. However, I wanted to make it
really easy to compile and install MySQL every time there is an upgrade.
I am going to use Python for everything I do and I plan on using it
for all SysAdmin needs in the future (if I can help it). Every time
there is a new version of Python, I am so impressed because it just gets
easier, simpler, and more powerful. In Python 2.4 (which just came out),
sets are really cool for intersection and union calculations.
MySQL is coming along nicely. With all the new features being put into
MySQL, it can compete against some of the larger database companies. It is
important to learn MySQL because new job opportunities are emerging and
MySQL will be a hot item over the next few years (before the masses learn
it). The Python module and the script I wrote make it really easy to get a
fairly complicated set of MySQL instances installed on your computer. To
use it in production, just copy over the binaries and the config files and
make a few changes to the config files and you are done! Once you see how
Master/Slave/Cluster configurations work, it really becomes simple to
understand how it all works.
Afterword:
Get your certifications.
In the last few months, I believe the MySQL Professional Certification
paid off. Most employers don't want just a DBA anymore. PostgreSQL and
MySQL make database servers easy to use for most people. No longer do you
have to worship the DBA from hell because he/she knows all the little
tricks. All the docs and examples are online and you can fool around with
the source code yourself. This is going to put pressure on DBAs to be more
than just DBAs. I believe the MySQL jobs I applied for really liked my
programming and Sys Admin skills. I know for a fact that one company used
my MySQL Professional Certification as leverage to interview me because
they could claim "he is certified". They really wanted a programmer, but
because of politics, they couldn't directly get a programmer, so my
certification let me slip in. Cool, huh? A lot of certifications really
don't tell you whether someone is good or not. However, the LPI and MySQL
Certifications have two things going for them. First, they are actually
reasonable certifications. Most certifications are meant to just make money
for the company, but LPI and MySQL don't seem to be like that; they really
want you to be qualified. Second, managers and HR like certifications
because it protects them. I know most techies hate certifications, but
because of politics, you really need to get certified to fight against the
machine. LPI and MySQL Certifications are fairly inexpensive, unlike a lot
of the other stupid certifications, so it really isn't that bad.
Mark Nielsen was enjoying his work at cnet.com as a MySQL DBA, but
is moving to Google as a MySQL DBA.
During his spare time, he uses Python heavily for mathematical and web
projects.
Copyright © 2005, Mark Nielsen. Released under the Open Publication license
unless otherwise noted in the body of the article. Linux Gazette is not
produced, sponsored, or endorsed by its prior host, SSC, Inc.
Published in Issue 110 of Linux Gazette, January 2005