The Quake2World engine draws a fair amount of attention on account of its performance when compared to other Quake- and Quake2-derived engines. On certain hardware, it does benchmark as much as 400% faster than the Quake2 3.21 engine source, upon which it was originally based. And so I figured I'd take a moment to talk about some of my optimizations.
Quake2 was released in 1997. Hardware acceleration was only available on higher-end PC's, and things like multitexture and vertex arrays which are commonplace today didn't even exist then. So naturally, Quake2's rendering techniques appear very dated in 2008. Multitexture was made a part of the OpenGL specification in version 1.2.1, and is available on most 2nd generation hardware (TNT or newer). I strongly recommend cleaning up the renderer and removing any non-multitexture rendering paths.
Vertex arrays are a bit more effort to introduce correctly, but are also one of the keys to achieving higher geometric complexity in your scenes. Vertex buffer objects are rather easy to slot in right next to your vertex array code if you plan your GL state management intelligently. The advantage of using these techniques is that you can reduce the number of API calls per frame (e.g. glVertex3fv) by an entire order of magnitude.
The smartest way to do this is to assemble massive precomputed vertex arrays for all static geometry at level load. This includes the world model (.bsp) and all non-animated mesh models (.md2, .md3, ..). World surfaces do not need to hold references to all of their vertexes and texture coordinates, but instead they can simply hold an integer offset into the shared arrays created for the .bsp. With arrays in place, drawing a series of surfaces can be broken down to binding the arrays in client state and calling glBindTexture and glDrawArrays for each surface.
Texture binds (glBindTexture) are rather expensive too, and so to minimize these per frame, you should group the world surfaces by texture at level load. I use a level of indirection via pointers arrays to accomplish this. Arrays of surface pointers are assembled according to world texture, and I iterate over these arrays after marking the visible surfaces each frame.
Next, and you may have picked up on this if you've been digging through the aforementioned functions, a significant performance boost can be attained by simply flagging visible world surfaces during BSP recursion rather than drawing them, and reusing these indicators for multiple frames whenever possible. The idea here is to only recurse the BSP tree when the view origin or angles change enough for this to be necessary. You may be surprised how often this expensive recursion can be skipped.
There are additional benefits to decoupling your BSP recursion strategy from your rendering functions as described above. The flexibility gained here opens the door for multithreading, allowing a second core to tackle the BSP work while the first adds particles and entities to the view. Even better, you're free to introduce a pluggable renderer framework, where renderer "plugins" can process the visible surfaces lists in fun and interesting ways.
These are probably the most drastic and beneficial changes I've found useful. There are many others, and I'm hopeful to find more. "The fastest way to do something is to not do it." Keep that phrase in mind, and keep plugging away at it.