Assignment III: Code Generation

In this third assignment, you will build upon the second assignment and will compile SQL statements into DBM code. You will not have to write a SQL parser as part of this assignment; we provide a SQL parser that produces a relational algebra-like representation of the SQL query, and you must generate DBM code starting from that representation.

This assignment is divided into the following parts:

  1. Loading the schema from a chidb file

  2. Generating code for simple SELECT queries

  3. Generating code for INSERT statements

  4. Generating code for CREATE TABLE statements

  5. Generating code for NATURAL JOIN queries

Please note that the first step has to be implemented before you move on to the remaining steps. After your implemented schema loading, steps 2, 3, and 4 can be done in parallel (step 5 can’t be done until you’ve done step 2).

Code Generator code

Your work on the code generator mill be mostly contained to the codegen.c file. This file contains a single function called chidb_stmt_codegen that takes two parameters:

  • stmt: A chidb_stmt that has been initialized but not modified in any other way. i.e., it is a DBM with no instructions, and with the default number of registers and cursors (all set to the “unspecified” type).

  • sql_stmt: A chisql_statement_t struct, representing the parsed SQL statement. This type, along with all the types related to it, are defined in the header files in include/chisql/.

chidb_stmt_codegen is called from the chidb API function chidb_prepare, which parses the SQL string before passing it along to chidb_stmt_codegen (it actually invokes the query optimizer before doing so, but right now the optimizer makes no changes to the query; you will be writing a query optimizer in the next assignment).

Right now, chidb_stmt_codegen generates a hard-coded DBM program. In this assignment, you will replace that code with your code generator.

You are welcome (and encouraged) to write auxiliary functions inside codegen.c.

Testing your Code Generator

Like Assignment II, we provide several DBM programs written in the DBM File Format. However, unlike Assignment II, these files contain SQL queries instead of DBM programs. These queries will be compiled into DBM programs before they are run, so they require a working code generator.

All these programs are stored in tests/files/dbm-programs (specifically, in the directories that start with sql-) and are run automatically by make check. The steps below also specify the location of the test programs that correspond to each step, as well as how to run the tests only on those files.

To test your implementation, you can also use the chidb shell to test individual SQL statements. You may also want to use the shell’s .parse command to see what the internal representation of a SQL query will be. The shell also has an EXPLAIN command that will allow you to see the code generated by your code generator before running it.

Step 1: Schema Loading

Before generating code, you must load the schema information from the chidb file, as it will be necessary for (1) knowing the root page of each table in the database and, (2) validating that a query is not just syntactically correct, but also valid for the given database.

For example, given a database with the following table:

CREATE TABLE courses(code INTEGER PRIMARY KEY, name TEXT, prof BYTE, dept INTEGER);

The following SQL statement is syntactically correct, but not valid for this table because it refers to columns that do not exist in the table:

SELECT name FROM courses WHERE quarter = "Spring" AND year = 2015;

Similarly, this SQL statement is syntactically correct, but not valid because dept is an integer column, not a string:

SELECT name FROM courses WHERE dept = "CMSC";

In this step, you will modify the implementation of chidb_open in api.c to load the database schema into the chidb struct (you will also need to modify the definition of the chidb struct in chidbInt.h to do this).

Remember that chidb files have a special table rooted in page 1 that contains the schema table. This is where you will have to load the schema from (the B-Tree functions are enough to do this). The in-memory representation of the schema is up to you, but we recommend you use the chisql_parser function (defined in include/chisql/) to parse the CREATE TABLE statement contained in each row of the schema table. We also suggest you write functions that allow you to easily perform the following operations:

  1. Given a table name, determine whether such a table exists.

  2. Given a table name, obtain its root page.

  3. Given a table name and a column name, determine whether such a column exists in the table.

  4. Given a table name and a column name, obtain the type of the column.

If you add functions to perform these operations, you should add them to utils.[ch].

Step 2: Simple SELECT Code Generation

You must generate code for a constrained subset of SELECT that meets all the following constraints:

  • The query only involved a single table.

  • The query will include either a list of columns or will be a SELECT *.

  • The query optionally has a WHERE clause but, when it does, it will always have a single condition be of the form column OP value, where OP can only be =, >, >=, <, or <=, and value can only be an integer or string literal.

Before generating the DBM code, you must check that the query is valid:

  • The table must exist.

  • The columns must exist in the specified table.

  • If a WHERE clause is included, the type of column must match the type of value.

If any of these conditions are not met, chidb_stmt_codegen must return CHIDB_EINVALIDSQL.

Note that, besides generating the DBM code, you will also have to set appropriate values for stmt->nCols and stmt->cols.

We suggest you look at the code in tests/files/dbm-programs/select/ for examples of DBM programs that correspond to various SELECT queries. You may also find it useful to use the EXPLAIN command in SQLite (not chidb) to see the kind of code that is generated by the SQLite code generator. Please take into account that chidb is a subset of SQLite, so you will see instructions that are not part of the chidb architecture, or which have slightly different parameters.

The DBMF files to test your SELECT code generation are located in tests/files/dbm-programs/sql-select/.

You can run just those DBM programs by running the following:

make tests/check_dbm && CK_RUN_SUITE="dbm-sql-select" tests/check_dbm

Step 3: INSERT Code Generation

You must generate code for a constrained subset of INSERT that meets all the following constraints:

  • The INSERT always specifies values for all columns. There are no default values.

  • The name of the table is never followed by a list of column names.

  • The values are integer or string literals (never NULL).

Before generating the DBM code, you must check that the statement is valid:

  • The table must exist.

  • The types of the values must match the types of the columns.

If any of these conditions are not met, chidb_stmt_codegen must return CHIDB_EINVALIDSQL.

We suggest you look at the code in tests/files/dbm-programs/insert/ for an example of a DBM program that corresponds to an INSERT statement.

The DBMF files to test your INSERT code generation are located in tests/files/dbm-programs/sql-insert/.

You can run just those DBM programs by running the following:

make tests/check_dbm && CK_RUN_SUITE="dbm-sql-insert" tests/check_dbm

Step 4: CREATE TABLE Code Generation

You must generate code for a constrained subset of CREATE TABLE that meets all the following constraints:

  • Each colum in the list of columns will always be of the form name type, where type can be either TEXT or INTEGER.

  • The first column will always be of type INTEGER PRIMARY KEY.

  • No additional constraints can be specified on the table.

Before generating the DBM code, you must check that the statement is valid. This only requires checking that the database does not already have a table with the same name. If it does, then chidb_stmt_codegen must return CHIDB_EINVALIDSQL.

We suggest you look at the code in tests/files/dbm-programs/create/ for an example of a DBM program that corresponds to a CREATE statement.

The DBMF files to test your CREATE code generation are located in tests/files/dbm-programs/sql-create/.

You can run just those DBM programs by running the following:

make tests/check_dbm && CK_RUN_SUITE="dbm-sql-create" tests/check_dbm

Step 5: NATURAL JOIN Code Generation

You must generate code for SELECT queries involving a NATURAL JOIN of two tables. You must continue to support the features in Step 2, taking into account that column names may be qualified with a table name (e.g., courses.dept to indicate column dept of table courses).

Tests for NATURAL JOIN are not currently available, but will be posted soon.