PROC SQL in SAS: Querying and Joining Data the SQL Way
PROC SQL bridges the gap between traditional SAS programming and the relational database world. Analysts who already know SQL can query SAS datasets immediately, while SAS veterans gain a powerful alternative to MERGE and PROC SORT that often produces cleaner, more readable code. Understanding when to use PROC SQL versus the DATA step and other procedures is a key competency for any SAS programmer working in business intelligence.
Basic SELECT Queries and Filtering
The SELECT statement in PROC SQL mirrors ANSI SQL almost exactly. You specify the columns you want, the table to read from, and an optional WHERE clause to filter rows. Unlike the DATA step, PROC SQL does not require sorted input — it handles sorting internally as needed. You can compute new columns inline, apply functions like ROUND, SUBSTR, and UPCASE, and alias them with the AS keyword. The DISTINCT keyword eliminates duplicate rows in one pass, replacing the PROC SORT plus DATA step deduplication pattern many analysts default to.
Aggregation with GROUP BY and HAVING
GROUP BY condenses many rows into summary statistics per group — essential for financial summaries, regional sales reports, and demographic breakdowns. Pair it with COUNT, SUM, AVG, MIN, and MAX to produce the aggregates you need. The HAVING clause filters on aggregated results, letting you surface only groups where total revenue exceeds a threshold or average response rate falls below a benchmark. This HAVING filter runs after aggregation, unlike WHERE, which runs before — a distinction that trips up analysts new to SQL and causes subtle errors in grouped analyses.
Joins: Combining Tables on Key Variables
PROC SQL supports inner joins, left joins, right joins, and full outer joins, all using standard SQL syntax. Inner joins return only rows with matching keys in both tables. Left joins preserve all rows from the left table and fill unmatched right-table columns with missing values — useful for keeping all customers even when some have no transactions. When joining on multiple keys, list each condition in the ON clause connected with AND. For large tables, ensure key columns are indexed or that you are joining on variables the engine can resolve efficiently; otherwise, Cartesian products can silently inflate your output and consume enormous memory.
Creating Tables, Views, and Macro Variables
Use CREATE TABLE to save query results as a new SAS dataset, or CREATE VIEW to define a virtual table that re-executes its query each time it is referenced — ideal for always-current reports. The INTO : clause captures scalar query results into macro variables, enabling dynamic parameterization of subsequent code. For example, you can capture the maximum date in a transaction log into a macro variable and use it to label a report automatically. Combining PROC SQL with the SAS macro facility unlocks a level of automation that dramatically reduces manual report maintenance. Visit our SASBase platform or contact us to learn more about optimizing your SAS workflows.