Assignment III: Code Generation¶
In this third assignment, you will build upon the second assignment and will compile SQL statements into DBM code. You will not have to write a SQL parser as part of this assignment; we provide a SQL parser that produces a relational algebra-like representation of the SQL query, and you must generate DBM code starting from that representation.
This assignment is divided into the following parts:
Loading the schema from a chidb file
Generating code for simple SELECT queries
Generating code for INSERT statements
Generating code for CREATE TABLE statements
Generating code for NATURAL JOIN queries
Please note that the first step has to be implemented before you move on to the remaining steps. After your implemented schema loading, steps 2, 3, and 4 can be done in parallel (step 5 can’t be done until you’ve done step 2).
Code Generator code¶
Your work on the code generator mill be mostly contained to the codegen.c
file.
This file contains a single function called chidb_stmt_codegen
that takes
two parameters:
stmt
: Achidb_stmt
that has been initialized but not modified in any other way. i.e., it is a DBM with no instructions, and with the default number of registers and cursors (all set to the “unspecified” type).sql_stmt
: Achisql_statement_t
struct, representing the parsed SQL statement. This type, along with all the types related to it, are defined in the header files ininclude/chisql/
.
chidb_stmt_codegen
is called from the chidb API function chidb_prepare
,
which parses the SQL string before passing it along to chidb_stmt_codegen
(it actually invokes the query optimizer before doing so, but right now the
optimizer makes no changes to the query; you will be writing a query optimizer
in the next assignment).
Right now, chidb_stmt_codegen
generates a hard-coded DBM program. In this
assignment, you will replace that code with your code generator.
You are welcome (and encouraged) to write auxiliary functions inside codegen.c
.
Testing your Code Generator¶
Like Assignment II, we provide several DBM programs written in the DBM File Format. However, unlike Assignment II, these files contain SQL queries instead of DBM programs. These queries will be compiled into DBM programs before they are run, so they require a working code generator.
All these programs are stored in tests/files/dbm-programs
(specifically, in the
directories that start with sql-
) and are run automatically by make check
.
The steps below also specify the location of the test programs that correspond to each step, as well
as how to run the tests only on those files.
To test your implementation, you can also use the chidb shell to
test individual SQL statements. You may also want to use the shell’s .parse
command
to see what the internal representation of a SQL query will be. The shell also has
an EXPLAIN command that will allow you to see the code
generated by your code generator before running it.
Step 1: Schema Loading¶
Before generating code, you must load the schema information from the chidb file, as it will be necessary for (1) knowing the root page of each table in the database and, (2) validating that a query is not just syntactically correct, but also valid for the given database.
For example, given a database with the following table:
CREATE TABLE courses(code INTEGER PRIMARY KEY, name TEXT, prof BYTE, dept INTEGER);
The following SQL statement is syntactically correct, but not valid for this table because it refers to columns that do not exist in the table:
SELECT name FROM courses WHERE quarter = "Spring" AND year = 2015;
Similarly, this SQL statement is syntactically correct, but not valid because dept
is
an integer column, not a string:
SELECT name FROM courses WHERE dept = "CMSC";
In this step, you will modify the implementation of chidb_open
in api.c
to
load the database schema into the chidb
struct (you will also need to modify
the definition of the chidb
struct in chidbInt.h
to do this).
Remember that chidb files have a special table rooted in page 1 that contains
the schema table. This is where you will have to load the
schema from (the B-Tree functions are enough to do this). The in-memory representation
of the schema is up to you, but we recommend you use the chisql_parser
function
(defined in include/chisql/
) to parse the CREATE TABLE
statement contained
in each row of the schema table. We also suggest you write functions that allow you to
easily perform the following operations:
Given a table name, determine whether such a table exists.
Given a table name, obtain its root page.
Given a table name and a column name, determine whether such a column exists in the table.
Given a table name and a column name, obtain the type of the column.
If you add functions to perform these operations, you should add them to utils.[ch]
.
Step 2: Simple SELECT Code Generation¶
You must generate code for a constrained subset of SELECT
that meets all the following constraints:
The query only involved a single table.
The query will include either a list of columns or will be a
SELECT *
.The query optionally has a
WHERE
clause but, when it does, it will always have a single condition be of the formcolumn OP value
, whereOP
can only be=
,>
,>=
,<
, or<=
, andvalue
can only be an integer or string literal.
Before generating the DBM code, you must check that the query is valid:
The table must exist.
The columns must exist in the specified table.
If a
WHERE
clause is included, the type ofcolumn
must match the type ofvalue
.
If any of these conditions are not met, chidb_stmt_codegen
must return CHIDB_EINVALIDSQL
.
Note that, besides generating the DBM code, you will also have to set appropriate values
for stmt->nCols
and stmt->cols
.
We suggest you look at the code in tests/files/dbm-programs/select/
for examples
of DBM programs that correspond to various SELECT
queries. You may also find it
useful to use the EXPLAIN
command in SQLite (not chidb) to see the kind of code
that is generated by the SQLite code generator.
Please take into account that chidb is a subset of SQLite, so you will see
instructions that are not part of the chidb architecture, or which have slightly
different parameters.
The DBMF files to test your SELECT
code generation are located
in tests/files/dbm-programs/sql-select/
.
You can run just those DBM programs by running the following:
make tests/check_dbm && CK_RUN_SUITE="dbm-sql-select" tests/check_dbm
Step 3: INSERT Code Generation¶
You must generate code for a constrained subset of INSERT
that meets all the following constraints:
The
INSERT
always specifies values for all columns. There are no default values.The name of the table is never followed by a list of column names.
The values are integer or string literals (never
NULL
).
Before generating the DBM code, you must check that the statement is valid:
The table must exist.
The types of the values must match the types of the columns.
If any of these conditions are not met, chidb_stmt_codegen
must return CHIDB_EINVALIDSQL
.
We suggest you look at the code in tests/files/dbm-programs/insert/
for an example
of a DBM program that corresponds to an INSERT
statement.
The DBMF files to test your INSERT
code generation are located
in tests/files/dbm-programs/sql-insert/
.
You can run just those DBM programs by running the following:
make tests/check_dbm && CK_RUN_SUITE="dbm-sql-insert" tests/check_dbm
Step 4: CREATE TABLE Code Generation¶
You must generate code for a constrained subset of CREATE TABLE
that meets all the following constraints:
Each colum in the list of columns will always be of the form
name type
, wheretype
can be eitherTEXT
orINTEGER
.The first column will always be of type
INTEGER PRIMARY KEY
.No additional constraints can be specified on the table.
Before generating the DBM code, you must check that the statement is valid. This only requires checking
that the database does not already have a table with the same name. If it does, then
chidb_stmt_codegen
must return CHIDB_EINVALIDSQL
.
We suggest you look at the code in tests/files/dbm-programs/create/
for an example
of a DBM program that corresponds to a CREATE
statement.
The DBMF files to test your CREATE
code generation are located
in tests/files/dbm-programs/sql-create/
.
You can run just those DBM programs by running the following:
make tests/check_dbm && CK_RUN_SUITE="dbm-sql-create" tests/check_dbm
Step 5: NATURAL JOIN Code Generation¶
You must generate code for SELECT
queries involving a NATURAL JOIN
of two tables.
You must continue to support the features in Step 2, taking into account that column names
may be qualified with a table name (e.g., courses.dept
to indicate column dept
of
table courses
).
Tests for NATURAL JOIN
are not currently available, but will be posted soon.