PROC SQL in SAS: Writing Queries Like a Pro
PROC SQL brings the full power of Structured Query Language into SAS, allowing you to query, join, and transform data using syntax familiar to anyone with database experience. For many operations, PROC SQL is more concise than equivalent DATA step code, and understanding both approaches makes you a more versatile SAS programmer.
PROC SQL Basics
PROC SQL operates within a single PROC block — you can run multiple SELECT statements without repeating PROC SQL and QUIT. The basic structure: PROC SQL; SELECT col1, col2 FROM library.dataset WHERE condition ORDER BY col1; QUIT;. Unlike most PROC steps, PROC SQL does not require a RUN statement — it executes immediately, and QUIT ends the PROC SQL session.
SAS datasets are referenced using the library.dataset notation in the FROM clause, just as in DATA steps. You can create new datasets with the CREATE TABLE statement: CREATE TABLE work.summary AS SELECT dept, AVG(salary) AS avg_sal FROM mydata.employees GROUP BY dept;
Joining Tables
PROC SQL supports all standard SQL join types. An INNER JOIN: SELECT a.*, b.dept_name FROM employees a INNER JOIN departments b ON a.dept_id = b.dept_id;. Unlike DATA step MERGE, SQL joins do not require pre-sorting. LEFT JOIN keeps all rows from the left table with NULLs where no match exists on the right. FULL JOIN and RIGHT JOIN are also supported.
Subqueries and Calculated Variables
PROC SQL supports correlated and non-correlated subqueries in WHERE clauses, FROM clauses (inline views), and HAVING clauses. Calculated columns use the CALCULATED keyword to reference columns defined in the same SELECT: SELECT salary, salary * 1.1 AS new_salary, CALCULATED new_salary - salary AS raise FROM employees;. This avoids repeating expressions.
Using Macro Variables with SQL
The INTO clause stores query results into macro variables: SELECT COUNT(*) INTO :nrows FROM mydata.employees; stores the count in macro variable &nrows, which can then be used in subsequent code or messages. For multiple values, INTO :varname SEPARATED BY ',' creates a comma-delimited list useful for dynamic WHERE IN() clauses.
Read our overview of SAS statistical procedures to move from data preparation to analysis, or explore our resources for recommended SAS learning materials.