Skip to content

I. Query Handling

Ishaan Lagwankar edited this page Dec 3, 2019 · 1 revision

Query Handling

Queries given by the user need to have a predefined syntax defined by the design of the framework. The standard SQL query syntax was followed here. There were 3 main operations defined on the engine.

Load

  1. Loading the schema is done with the standard syntax being load <csv_file_path> as <schema_specification>;
  2. This command loads the csv schema as a dictionary onto the metastore, which was stored on the disk memory of the client system. This was chosen to reduce the time of access of the schema, mainly to reduce the time of I/O from the Hadoop cluster.
  3. The worker nodes would be abstracted from this schema and would have only the knowledge of the mapreduce job they would be processing.
  4. Loading also handles the problem of conflicting schemas for the same file, by overwriting the old schema with the new schema.

Select / Project

  1. The select/project queries are given by select <columns> from <csv_file_path> where <conditions>;
  2. The parsing of this query results in a mapreduce job provision, where a mapper and reducer are generated per user, allowing multiple users on the same infrastructure to access the engine simultaneously.
  3. The mapper and reducer are generated by code generation methods, which work in the background when the sanity checks of the query have been done.
  4. The SQL engine provides support for select and project queries, involving multiple aggregations and condition-based checking as well.
  5. Aggregations such as max, min, sum, average and count work in accordance with standard where clauses, providing an SQL-like interface for performing simple queries on large amounts of data.

Delete

  1. The delete query is given by delete <csv_file_path>;
  2. The delete query does not delete the file from the HDFS, as there might be other applications using this file. It simply removes the schema from the metastore, allowing the rewrite of the schema when a user wants to modify it.
Clone this wiki locally