CQL by Example

This document (still under development) is part of the Scid++ project and is actually a suite of documents relating to the Chess Query Language. CQL is a novel and powerful language developed by Gady Costeff and Lewis Stiller, with application across a wide range of orthodox chess interests.

We started out many moons ago referring to the document as a tutorial. It's not. It never will be. To our knowledge, there are only two individuals on this planet who are qualified to tutorialize on the subject matter, and the author of this document is not one of them.

Instead, one should look on what follows as a record of discovery. An accounting of the learning process, warts and mistakes and wrong turns and dumb ideas proudly on display.¹

So why should we² even bother with all of this scribbling and doodling? The project is an experiment. A proof of concept. Experiments should be documented. A proof should include supporting evidence. That's our story. We're sticking to it.

The grand experiment

Is there a place for CQL in the problem domain? Is there a serious role for the language across at least a significant slice of the orthodox side of things? That is our question. This document seeks to provide the answer.

An even better inquery might be, "Why has the question not already been answered?" The essential pieces have been in place for years (decades): the solving engine, the databases (sans solution), the query language. As it turns out, tying all the pieces together has not been difficult at all. It has only required a bit of grunt work. The challenge is in showing that we can do interesting things with the end result.

Answering that challenge has required going places we didn't even know existed. Cross-solution queries (for twinning), multi-pass queries (position tagging, analysis and follow-on filtering), the handling of solution trees with thousands of nodes. Language facilities for much of it have only recently evolved. The CQL developers have been busy.

Evolution of the language

CQL first came to our attention many years ago in the v3.x days. We thought it interesting, original, and worth keeping an eye on. With the release of v5.0 things were getting serious. So serious that with v5.2 we initiated an integration effort with our favorite chess-related tool set. Lewis Stiller graciously contributed some key elements of that effort.

Version 6 of the language came along and on Stiller's recommendation we disentangled the two code bases, primarily to accommodate multi-threading and upgrades with as little fuss as possible. Alot of work went into v6.0, but the magnitude of the advancements in the language was not immediately apparent to us. We had to use the revisions in order to fully appreciate just how much improved the language had become.

The follow-on version 6.1 of the language is a whole different matter. The magnitude and scope of the release fairly slaps one right upside the head. Rely not on Stiller — that masterful purveyor of understatement — to adequately convey the importance of this release. It's huge. A very well thought out string implementation (consider that a string can represent anything), full blown ECMA-compliant regular expressions, associative arrays, persistence across games (think twins) and across queries (think multipass). One could wax hyperbolic at length.

While the latest version 6.2 (currently in beta test as of this writing) is focused more on the natural and concise expression of chess themes, it also includes many new and powerful features which make our lives considerably less complicated. We highly recommend the release.

The capabilities of the language have evolved so significantly over the course of the past couple of years that it far exceeds the sum of its parts. We feel that we are limited only by our imagination. We shall put that impression to the test as we populate the document roadmap below... a record of the experiment with no end.

The mainline documents

As the language evolves, so the document evolves. Huge swaths of this document have been discarded or rewritten over time. Editing for incremental improvement is a constant. We expect for that process to continue as we discover "better ideas".

We've split out the main document into two parts. As a single doc it was growing too large and unwieldy, and the two parts are really not related. The first part addresses the application of CQL in a domain that is not particularly aligned with this project. The second part gets down to the business of the grand experiment.

The conventional domain

Part one of the main document on queries in the conventional domain is in fulfillment of a commitment to the upstream project. When the CQL engine was first integrated with the upstream executable and merged into that project, the need for a large collection of examples addressing over-the-board games became apparent.

In addressing that commitment, we hoped to demonstrate the broad range of the application of the language in a conventional chess context. We also hoped to lay a decent foundation for the harder things to come. Even experienced CQL users might find a few of the examples to be entertaining.

The problem domain

While the integration of the CQL and Popeye engines with the Scid application is well-warranted in its own right, the entire point of the Scid++ project lands right here, squarely in the problem domain. If CQL has practical, demonstrable application in that domain, the proof of it (or at least the supporting evidence for it) will be found in part two of the main document.

Progress on the problem domain will come in fits and starts. We'll periodically go off on a tangent to work on a companion document or to add a feature to the Scid tool chest. What's there looks promising, but we have a long way to go.

The companion documents

We call this our CQL Universe series. Diversions from the mainline, much of it wildly speculative. This section is a guide to documents currently in some stage of planning or development, laying out something of a roadmap for things to come.

If there is a hint of tutorializing anywhere in the document suite, this is where it will show. One should keep in mind, however, that there is nothing authoritative anywhere in our writing regarding CQL. We're making it up as we go.

Understanding sets

This first companion was inspired by a suggestion from an early reviewer of the main document. We were initially skeptical, but Understanding Sets in a CQL Universe has received a good bit of positive feedback from newcomers as it addresses a fundamental pillar of the language.

Square sets are everywhere in CQL, and visualization of those sets on the board is apparently a key to comprehending the language. After one's initial review of the online reference docs, this should probably be the next stop.

Game tree traversal

When we first moved from the mainline to the solution tree with its scores of variations, we found ourselves in a bit of a state of shock. After years of dabbling, we thought we had a fair grasp of the language. The truth is that we had barely scratched the surface.

To this day, we still have no intuitive feel for much of the traversal mechanics employed by the language. The behavior of the move filter with linearization disabled, for example, is exactly what one would expect after thinking about it for several hours over a pitcher of Heineken.

We authored Game Tree Traversal in a CQL Universe not for the benefit of others but for our own edification. Had we not, we would be hopelessly lost as we wander our way through the solutions forest looking for themes and patterns. The document represents just a beginning of an understanding of the traversal of the game tree, and most probably with a few ~~lies~~ inaccuracies thrown in.

One should take this companion as a hint of what lies ahead in the problem domain and as a guide to pursuing one's own forensics. When trying to understand the inner workings of, say, the line filter with linearization enabled and repetition qualifiers in play, the message filter is one's indispensable friend.

Handling twins

Handling Twins in a CQL Universe began as a bit of a lark. A challenge. A dare (she knows who she is). It quickly became one of the more interesting and fun things that we have ever done with the language. It helps if one's mind is a bit twisted, and in the wake of a session of pattern matching across twins one might feel the need for a long visit with one's therapist. But the experience has been most rewarding.

Twinning represents a significant slice of the problem domain, and so we look on this companion as more than just a sidebar to our larger "proof". Some very interesting patterns may manifest across twins and the techniques developed in their handling will likely have application elsewhere.

Update: Visits to the therapist may be a thing of the past. (Or not.) We have mixed emotions about the release of CQL v6.1, as the mind requires less twisting now that we have a completely new set of lexical toys which just happen to be uberly handy in a twinning context. It's as if the CQL gods could no longer bear to witness the tortuous querying and deigned to take pity on the poor plebe. We suppose we should be grateful. We're still pondering.

Transforming boards

Not exactly boards. Representations of boards. Think set variables (bitboards) or perhaps even a FEN string. Rumor has it that transformations may be realized with nothing more than direction filters. We've experimented with the assertion and are mostly convinced that it holds.

The need for such a capability arises in, e.g., cross-twin queries where positions are persistently carried across a series and might require shifting or rotating, for example, for comparison purposes. We might also find a similar approach useful in detecting certain forms of symmetry.

The Transforming Boards in a CQL Universe document will be developed alongside the Handling Twins companion, and may ultimately lead to the assembly of a crude "library" of functionality.

Translating FEN

Huh? From what and to what? Well, bitboards, of course.

In our work on the Transforming Boards companion we found ourselves in need of a bi-directional translation facility between FEN strings and a representative suite of bitboards. We're having difficulty visualizing transformed bitboards, especially with the flips and rotations. If we can translate between representations, we can write our transformed boards to an EPD file for easy viewing. Problem solved.

This little diversion — Translating FEN in a CQL Universe — probably belongs in the String Theory companion, but we have no idea what that document looks like yet and we're in a hurry. So there you have it.

String theory (tentative)

Well, no. Not that string theory.

With the introduction of first class strings and regular expressions in version 6.1 of the language, the world of possibilities appears to be unbounded. What can one not represent with a string? Carrying information across solutions and queries just became much easier.

We suspect that string handling techniques developed for twinning and prophylaxis will have application in other areas. Putting it all in a companion document makes sense.

Query design strategies (planned)

We've learned a few things about constructing queries looking for some rather nebulous chess concepts and ideas. The direct approach is often the least effective approach. For example, rather than asking what are the essential elements of a pattern, we might do better to ask what is the pattern not.

This companion will also likely include a section on multi-pass queries. When and why to employ them, and how to carry persistent information across them.

Problem analysis (planned)

Distantly related to the task of searching for patterns is the challenge of recognizing patterns as one is solving problems. We set out with this document to defeat the whole purpose of problem solving. But we have the novice (namely ourselves) in mind, who might be wondering how does one even get started? And how much of the nuance in the composition escapes one when the theme runs deep?

We'll use the Babson task as a case study in demonstrating the use of a handful of Scid tools in analysing compositions. CQL and Stockfish play a role, but the solving engine is at the center of the analysis and some lesser tools can make the process slightly less daunting. Taking a deep dive into the solution tree and considering some branches as having stipulations of their own can lead to some revealing insights.

Seeking prophylaxis (planned)

As a bit of a diversion from our activities in the problem domain, we're entertaining a rather unlikely application of the language. Namely, the search for deep prophylactic ideas in a database of grandmaster works of art. That we're even considering such a thing speaks volumes of the language.

The impetus for this idea originated in an e-mail exchange with an Indian GM who had come across our example of a search for shallow (superficial) prophylaxis in the mainline document. That example employs a language technique for anticipating a move that does not exist in the game tree. The benefit of such an approach is somewhat limited.

But if we first tag positions of interest with an initial CQL pass, then we can expand the game tree at those positions — with multiPV threat analysis from an engine — giving us evaluations and actual lines of play for examination by a follow-on CQL query. Threat analysis takes place on the other side of a null move at the tagged position, and the threat lines can be compared against the main line. It's an interesting idea.

We expect that prophylaxis is just one of many applications of the multi-pass query, especially now that version 6.1 of the language allows for persistence across queries. This particular application requires a minor extension to the Scid analysis handling subsystem, but that's likely an exceptional intervention. Multi-pass queries may well become commonplace in the not-too-distant future.

Tools and resources

Contacting

Questions and comments relating to this document, or to CQL in general, may be directed to the Scid++ project's CQL mailing list at SourceForge.

Credits

This document is Copyright (c) 2019-2024 Lionel Hampton. The Chess Query Language was originally developed by Gady Costeff and Lewis Stiller. The upstream Scid vs PC project is managed by Steven Atkinson.

Thanks go to Costeff and Stiller for their innovative work on CQL. Lewis Stiller has lent considerable assistance to the author of this document and to the success of the integration of the various CQL engines with the Scid vs PC application.

Piger hæreditabit terram

We feel obliged to explain certain personal failings which we indulge throughout the document.

The astute engineering type might have observed that we are in the habit of mindlessly wasting CPU cycles in the construction of our example queries. Actually, it's not mindless at all. It's quite intentional. It's all in the interest of clarity.

If we can save just one line of code in the name of clarity, we believe it is well worth a few melted transistors. And if in the process we can avoid having to invent the name of a single superfluous variable (we hate naming variables), then universal justice is surely at hand. Calculate that value a thousand times over, we say, rather than waste space with an assignment.

It's a personal choice. We freely confess the infraction. And we happen to know, in any event, that Stiller went to extraordinary lengths to give us multi-threading specifically so that we could trade away the performance improvements for clarity's sake.

Let us hear no more heckling from the bleachers.

Language notes

There are a number of issues that Scid++ users and consumers of this document suite should be aware of. These issues are more-or-less specific to the use of CQL in the context of the Scid++ project.

Version compatibilities

The mass of example queries found in this document suite originated with the release of CQLv6.0, and as such are lagging considerably behind the language's latest features, syntax, and capabilites. While succesive CQL releases are generally back-compatible, there are the inevitable exceptions to the rule. A significant number of the examples in their present state will fail to execute with the latest CQL releases:

because persistent declarations sans assignment are no longer allowed (v6.1), or
because in many cases the dictionary key/value types now need to be "declared" early on in the query (v6.2).

We are presently working to address these version back-incompatibilities and, beyond that, the glaring complications manifest in earlier language techniques which we have employed but which have been relieved by the follow-on v6.1 and v6.2 revisions of the language.

A second issue that we'll eventually have to deal with is the deprecation of filters and syntactical constructs which are in wide use throughout the document suite. These deprecations are numerous and include some rather significant filters (most notably the move and line filters). For some of the examples this may require non-trivial resolution, and so we may just decide to leave them unrevised with a note indicating their version compatibilities.

Our priorities in addressing these issues are first (and overwhelmingly) with the Twins document -- which has garnered the greatest amount of attention by a huge margin -- and then with other companion documents having a direct impact on that document. Revisions are already underway (October 2023) but will take some considerable amount of time to complete.

CQL header pseudo-parameters

We've introduced a number of pseudo-parameters to the CQL header which are subject to pre-processing by the Scid++ CQL search facility prior to launching the CQL sub-process. These parameters are removed from the CQL header prior to writing the temporary CQL file.

A couple of useful CQL command line options are not otherwise available as header parameters. Scid++ users may give the alwayscomment and nolinearize command line options as header parameters and those parameters will be converted to their respective command line options.

We've also introduced an include pseudo-directive which may be used to load libraries of CQL functionality which may be required by a query. The libraries will be inserted into the CQL file immediately following the header. Numerous examples of this parameter's use will be found throughout the document suite. The project's support files include a collection of libraries in use primarily by examples in the Twins companion document.

Footnotes

¹ The committed grammaticaster should probably read no further. Much of this document is written in conversational English. We use the comma for pause. The period for effect. Nouns and verbs are a nuisance. Run-on sentences are beautiful. Especially when a single run-on fully comprises a paragraph. We know that makes some people crazy. We don't care.

² And don't even get us started on the utter stupidity of first person plural active voice...