Migrating SAS to Snowflake: DATA Steps and PROC SQL to Snowpark and SQL Pipelines

SAS has been the backbone of enterprise analytics for decades, powering everything from regulatory reporting in banking to clinical trial analysis in pharmaceuticals. But as organizations modernize their data infrastructure around cloud data platforms, SAS's proprietary licensing model, on-premise architecture, and closed ecosystem increasingly conflict with the flexibility, scalability, and cost transparency that modern data teams demand. Snowflake, with its elastic compute, native SQL engine, and Snowpark Python framework, provides a compelling migration target that can absorb the full breadth of SAS workloads — from simple DATA step transformations to complex statistical procedures.

This article provides a detailed technical mapping of SAS programming concepts to their Snowflake equivalents, covering DATA steps, PROC SQL, SAS Macros, statistical procedures, format catalogs, libname references, and the fundamental shift from SAS Grid computing to Snowflake virtual warehouses. Whether you are migrating thousands of SAS programs accumulated over twenty years or selectively re-platforming critical pipelines, understanding these mappings is the essential first step.

SAS Architecture vs. Snowflake Architecture

SAS operates on a tightly integrated architecture where the SAS runtime reads data from SAS datasets (sas7bdat files) or external databases via libname engines, processes transformations in the SAS workspace server's memory, and writes results back to datasets or external targets. SAS Grid Manager distributes workloads across multiple compute nodes, but capacity planning is manual and infrastructure is provisioned for peak loads.

Snowflake separates storage and compute entirely. Data lives in Snowflake's managed cloud storage layer (automatically compressed, encrypted, and partitioned into micro-partitions), while virtual warehouses provide elastic compute that scales up (larger warehouse) or out (multi-cluster) independently. Snowpark enables Python, Java, and Scala DataFrames that push computation down to Snowflake's engine — meaning transformation logic runs where the data lives, not on a separate application server. This architecture eliminates the data movement bottleneck that plagues SAS environments where data must be transferred from a database to the SAS workspace server for processing.

SAS Concept	Snowflake Equivalent	Notes
DATA step	Snowpark Python DataFrame / SQL INSERT...SELECT	Row-level logic maps to Snowpark; set-based logic maps to SQL
PROC SQL	Snowflake SQL	Nearly 1:1 mapping; ANSI SQL compatible
SAS Macro (%macro/%mend)	Snowflake Stored Procedure / Snowpark UDF	Parameterized reusable logic
SAS Dataset (sas7bdat)	Snowflake Table	Managed columnar storage with auto-clustering
SAS Library (LIBNAME)	Database.Schema	Two-level naming: database.schema.table
SAS Formats / Informats	CASE expressions / TO_CHAR / TRY_CAST	Display formatting handled at query or BI layer
PROC MEANS / PROC SUMMARY	SQL Aggregates (AVG, STDDEV, PERCENTILE_CONT)	Full analytic function support
PROC FREQ	GROUP BY with COUNT / RATIO_TO_REPORT	Cross-tabulations via PIVOT
PROC SORT	ORDER BY / CLUSTER BY	Snowflake auto-clusters; explicit sort only for output
PROC TRANSPOSE	PIVOT / UNPIVOT	Native SQL syntax for reshaping
SAS Macro Variables (&var)	Session Variables / Procedure Arguments	SET variable = value; or procedure parameters
SAS Grid Manager	Virtual Warehouses (auto-scale)	Elastic, auto-suspend, auto-resume compute
SAS Scheduling (LSF/Cron)	Snowflake Tasks	CRON-based scheduling with DAG dependencies
SAS Enterprise Guide Projects	Snowsight Worksheets / Notebooks	Interactive development with version control
SAS/ACCESS Engines	Stages / Storage Integrations / Connectors	Native cloud storage access and partner connectors

SAS to Snowflake migration — automated end-to-end by MigryX

DATA Step to Snowpark Python

The SAS DATA step is the most distinctive feature of SAS programming. It processes data row by row through an implicit loop, with features like RETAIN, first./last. processing, arrays, and conditional OUTPUT that have no direct SQL equivalent. Snowpark Python provides the closest functional equivalent because it supports both set-based DataFrame operations and row-level UDF logic that executes on Snowflake's distributed compute engine.

Simple DATA Step with Conditional Logic and Multiple OUTPUT

Consider a SAS DATA step that reads a customer transactions dataset, applies business rules to classify transactions, and outputs to multiple datasets based on conditions. This is one of the most common SAS patterns — a single DATA step that reads one input and conditionally writes to multiple output datasets.

/* SAS DATA step with IF/THEN/OUTPUT */
data high_value medium_value low_value;
    set work.transactions;

    length risk_category $12;

    total_amount = quantity * unit_price;
    discount_amount = total_amount * (discount_pct / 100);
    net_amount = total_amount - discount_amount;

    if net_amount >= 10000 then do;
        risk_category = 'HIGH';
        output high_value;
    end;
    else if net_amount >= 1000 then do;
        risk_category = 'MEDIUM';
        output medium_value;
    end;
    else do;
        risk_category = 'LOW';
        output low_value;
    end;
run;

# Snowpark Python equivalent
from snowflake.snowpark import Session, functions as F
from snowflake.snowpark.types import StringType

session = Session.builder.configs(connection_params).create()

# Read source table (equivalent of SET statement)
transactions = session.table("staging.transactions")

# Apply calculated columns (equivalent of assignment statements)
enriched = transactions.with_column(
    "TOTAL_AMOUNT", F.col("QUANTITY") * F.col("UNIT_PRICE")
).with_column(
    "DISCOUNT_AMOUNT",
    F.col("QUANTITY") * F.col("UNIT_PRICE") * (F.col("DISCOUNT_PCT") / 100)
).with_column(
    "NET_AMOUNT",
    (F.col("QUANTITY") * F.col("UNIT_PRICE"))
    - (F.col("QUANTITY") * F.col("UNIT_PRICE") * (F.col("DISCOUNT_PCT") / 100))
).with_column(
    "RISK_CATEGORY",
    F.when(F.col("NET_AMOUNT") >= 10000, F.lit("HIGH"))
     .when(F.col("NET_AMOUNT") >= 1000, F.lit("MEDIUM"))
     .otherwise(F.lit("LOW"))
)

# Output to multiple tables (equivalent of multiple OUTPUT statements)
enriched.filter(F.col("RISK_CATEGORY") == "HIGH") \
    .write.mode("overwrite").save_as_table("analytics.high_value")
enriched.filter(F.col("RISK_CATEGORY") == "MEDIUM") \
    .write.mode("overwrite").save_as_table("analytics.medium_value")
enriched.filter(F.col("RISK_CATEGORY") == "LOW") \
    .write.mode("overwrite").save_as_table("analytics.low_value")

DATA Step with RETAIN and First/Last Processing

SAS's RETAIN statement and first./last. automatic variables enable running totals, group-level aggregation, and state tracking across rows. These are among the most complex SAS patterns to migrate because they depend on the implicit row-by-row loop and ordered processing. In Snowflake, window functions provide equivalent capabilities without requiring row-by-row processing, and they execute in parallel across the distributed compute engine.

/* SAS: Running total within each customer group */
proc sort data=work.orders; by customer_id order_date; run;

data work.order_running;
    set work.orders;
    by customer_id order_date;
    retain running_total 0;

    if first.customer_id then running_total = 0;
    running_total + order_amount;

    if last.customer_id then is_final_record = 1;
    else is_final_record = 0;
run;

-- Snowflake SQL equivalent using window functions
CREATE OR REPLACE TABLE analytics.order_running AS
SELECT
    *,
    SUM(order_amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total,
    CASE
        WHEN ROW_NUMBER() OVER (
            PARTITION BY customer_id ORDER BY order_date DESC
        ) = 1 THEN 1 ELSE 0
    END AS is_final_record
FROM staging.orders
ORDER BY customer_id, order_date;

SAS DATA step processing is inherently row-sequential, which limits parallelism. Snowflake window functions achieve the same results but execute across Snowflake's distributed compute engine, often completing in seconds what SAS processes in minutes on large datasets. MigryX's AST-based deterministic parser identifies RETAIN/first./last. patterns and automatically maps them to the appropriate window function constructs, achieving +95% parser accuracy even on deeply nested SAS logic.

MigryX: Idiomatic Code, Not Line-by-Line Translation

The difference between MigryX and manual migration is not just speed — it is code quality. MigryX generates idiomatic, platform-optimized code that leverages native features of your target platform. A SAS DATA step does not become a clunky row-by-row loop — it becomes a clean, vectorized DataFrame operation. A PROC SQL query does not become a literal translation — it becomes an optimized query that takes advantage of your platform’s pushdown capabilities.

PROC SQL to Snowflake SQL

PROC SQL is the most straightforward migration path because SAS's SQL implementation closely follows ANSI SQL. However, there are important syntactic differences and SAS-specific extensions that require careful handling during conversion. The calculated keyword, SAS date literals, SELECT INTO for macro variable creation, and MONOTONIC() are the most frequently encountered conversion points.

PROC SQL with Subquery and INTO Clause

/* SAS PROC SQL with subquery and macro variable creation */
proc sql;
    create table work.top_customers as
    select
        c.customer_id,
        c.customer_name,
        c.segment,
        sum(o.order_amount) as total_revenue,
        count(distinct o.order_id) as order_count,
        calculated total_revenue / calculated order_count as avg_order_value
    from work.customers c
    inner join work.orders o
        on c.customer_id = o.customer_id
    where o.order_date >= '01JAN2025'd
    group by c.customer_id, c.customer_name, c.segment
    having calculated total_revenue > (
        select avg(sub_total)
        from (
            select sum(order_amount) as sub_total
            from work.orders
            where order_date >= '01JAN2025'd
            group by customer_id
        )
    )
    order by total_revenue desc;

    select count(*) into :top_count trimmed from work.top_customers;
quit;

%put NOTE: Found &top_count top customers;

-- Snowflake SQL equivalent
CREATE OR REPLACE TABLE analytics.top_customers AS
WITH customer_revenue AS (
    SELECT
        c.customer_id,
        c.customer_name,
        c.segment,
        SUM(o.order_amount) AS total_revenue,
        COUNT(DISTINCT o.order_id) AS order_count
    FROM staging.customers c
    INNER JOIN staging.orders o
        ON c.customer_id = o.customer_id
    WHERE o.order_date >= '2025-01-01'
    GROUP BY c.customer_id, c.customer_name, c.segment
)
SELECT
    customer_id,
    customer_name,
    segment,
    total_revenue,
    order_count,
    total_revenue / NULLIF(order_count, 0) AS avg_order_value
FROM customer_revenue
WHERE total_revenue > (
    SELECT AVG(sub_total)
    FROM (
        SELECT SUM(order_amount) AS sub_total
        FROM staging.orders
        WHERE order_date >= '2025-01-01'
        GROUP BY customer_id
    )
)
ORDER BY total_revenue DESC;

-- Equivalent of SELECT INTO :macro_variable
SET top_count = (SELECT COUNT(*) FROM analytics.top_customers);

SAS PROC SQL Feature	Snowflake SQL Equivalent	Migration Notes
`calculated` keyword	CTE or repeat expression	Snowflake does not support referencing aliases in same SELECT level
`'01JAN2025'd` date literal	`'2025-01-01'` ISO format	All SAS date literals must be converted to ISO 8601
`SELECT INTO :var`	`SET var = (SELECT ...)`	Session variables replace macro variables
`MONOTONIC()`	`ROW_NUMBER() OVER (ORDER BY ...)`	Explicit ordering required in Snowflake
`CREATE TABLE AS`	`CREATE OR REPLACE TABLE AS`	Snowflake supports OR REPLACE for idempotent DDL
`CONNECT TO lib (sql)`	Direct SQL (data already in Snowflake)	Pass-through SQL unnecessary when all data is in Snowflake
`CASE WHEN ... THEN`	Identical syntax	Directly compatible — no conversion needed
`COALESCE / IFNULL`	`COALESCE / NVL / IFF`	Snowflake provides multiple null-handling functions

SAS Macros to Snowflake Stored Procedures and Snowpark UDFs

SAS Macros generate and execute SAS code dynamically using the macro preprocessor. They range from simple variable substitution (&var) to complex program generators with conditional logic, loops, and nested macro calls. Snowflake stored procedures written in Snowflake Scripting (SQL-based) or Snowpark Python provide equivalent parameterization and dynamic execution capabilities.

SAS Macro to Snowflake Stored Procedure

/* SAS Macro: Process monthly data for a given table and date range */
%macro process_monthly(input_table=, start_date=, end_date=, output_schema=);
    %let start_dt = %sysfunc(inputn(&start_date, yymmdd10.), date9.);
    %let end_dt = %sysfunc(inputn(&end_date, yymmdd10.), date9.);

    proc sql;
        create table &output_schema..monthly_summary as
        select
            customer_segment,
            intnx('month', transaction_date, 0, 'B') as month_start format=date9.,
            count(*) as txn_count,
            sum(amount) as total_amount,
            mean(amount) as avg_amount,
            std(amount) as std_amount
        from &input_table
        where transaction_date between "&start_dt"d and "&end_dt"d
        group by customer_segment,
                 intnx('month', transaction_date, 0, 'B')
        order by month_start, customer_segment;
    quit;

    %put NOTE: Monthly summary created in &output_schema;
%mend process_monthly;

%process_monthly(
    input_table=warehouse.transactions,
    start_date=2025-01-01,
    end_date=2025-12-31,
    output_schema=work
);

-- Snowflake Stored Procedure equivalent
CREATE OR REPLACE PROCEDURE analytics.process_monthly(
    INPUT_TABLE VARCHAR,
    START_DATE DATE,
    END_DATE DATE,
    OUTPUT_SCHEMA VARCHAR
)
RETURNS VARCHAR
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$
DECLARE
    row_count INTEGER;
    full_table_name VARCHAR;
BEGIN
    full_table_name := :OUTPUT_SCHEMA || '.monthly_summary';

    EXECUTE IMMEDIATE '
        CREATE OR REPLACE TABLE ' || :full_table_name || ' AS
        SELECT
            customer_segment,
            DATE_TRUNC(''MONTH'', transaction_date) AS month_start,
            COUNT(*) AS txn_count,
            SUM(amount) AS total_amount,
            AVG(amount) AS avg_amount,
            STDDEV(amount) AS std_amount
        FROM ' || :INPUT_TABLE || '
        WHERE transaction_date BETWEEN ''' || :START_DATE || ''' AND ''' || :END_DATE || '''
        GROUP BY customer_segment, DATE_TRUNC(''MONTH'', transaction_date)
        ORDER BY month_start, customer_segment
    ';

    EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM ' || :full_table_name INTO row_count;

    RETURN 'Monthly summary created with ' || row_count || ' rows';
END;
$$;

-- Execute the procedure (equivalent of %process_monthly call)
CALL analytics.process_monthly(
    'warehouse.transactions',
    '2025-01-01'::DATE,
    '2025-12-31'::DATE,
    'analytics'
);

SAS Formats and Informats to Snowflake Expressions

SAS formats control how data values are displayed without changing the underlying stored value. Custom formats created with PROC FORMAT are a core SAS pattern for mapping codes to labels, binning continuous variables, and applying business rules. In Snowflake, this logic is implemented through CASE expressions, lookup tables, or Snowpark UDFs. Understanding this mapping is critical because large SAS environments can contain hundreds of custom formats referenced across thousands of programs.

/* SAS PROC FORMAT: Custom format for risk scoring */
proc format;
    value risk_fmt
        0 -< 300 = 'Critical'
        300 -< 600 = 'High'
        600 -< 750 = 'Medium'
        750 - high = 'Low';
    value $region_fmt
        'NE','MA','CT','NY' = 'Northeast'
        'CA','WA','OR'      = 'West'
        'TX','FL','GA'      = 'South'
        other               = 'Other';
run;

data work.scored;
    set work.accounts;
    risk_label = put(credit_score, risk_fmt.);
    region_name = put(state_code, $region_fmt.);
run;

-- Snowflake SQL equivalent using CASE expressions
CREATE OR REPLACE TABLE analytics.scored AS
SELECT
    *,
    CASE
        WHEN credit_score >= 0   AND credit_score < 300 THEN 'Critical'
        WHEN credit_score >= 300 AND credit_score < 600 THEN 'High'
        WHEN credit_score >= 600 AND credit_score < 750 THEN 'Medium'
        WHEN credit_score >= 750 THEN 'Low'
    END AS risk_label,
    CASE state_code
        WHEN 'NE' THEN 'Northeast'
        WHEN 'MA' THEN 'Northeast'
        WHEN 'CT' THEN 'Northeast'
        WHEN 'NY' THEN 'Northeast'
        WHEN 'CA' THEN 'West'
        WHEN 'WA' THEN 'West'
        WHEN 'OR' THEN 'West'
        WHEN 'TX' THEN 'South'
        WHEN 'FL' THEN 'South'
        WHEN 'GA' THEN 'South'
        ELSE 'Other'
    END AS region_name
FROM staging.accounts;

-- Alternative: Lookup table approach for large format catalogs
CREATE OR REPLACE TABLE ref.region_mapping (
    state_code VARCHAR(4) PRIMARY KEY,
    region_name VARCHAR(20)
);

INSERT INTO ref.region_mapping VALUES
    ('NE','Northeast'),('MA','Northeast'),('CT','Northeast'),('NY','Northeast'),
    ('CA','West'),('WA','West'),('OR','West'),
    ('TX','South'),('FL','South'),('GA','South');

SELECT a.*, COALESCE(r.region_name, 'Other') AS region_name
FROM staging.accounts a
LEFT JOIN ref.region_mapping r ON a.state_code = r.state_code;

SAS format catalogs can contain hundreds of custom formats accumulated over years of development. MigryX's AST-based parser automatically extracts all PUT() function calls, resolves them against their PROC FORMAT definitions, and generates the equivalent CASE expressions or lookup table JOINs. MigryX also generates STTM (Source-to-Target Mapping) documentation that traces every format conversion for audit and validation purposes.

Statistical Procedures to SQL Aggregates and Window Functions

SAS's statistical procedures (PROC MEANS, PROC FREQ, PROC SUMMARY, PROC UNIVARIATE) are heavily used for reporting, data profiling, and regulatory analysis. Snowflake's SQL aggregate functions, window functions, and PIVOT/UNPIVOT syntax cover the vast majority of these use cases. For advanced statistical modeling (PROC REG, PROC LOGISTIC, PROC MIXED), Snowpark Python with libraries like scikit-learn provides equivalent capabilities.

PROC MEANS / PROC SUMMARY

/* SAS PROC MEANS with CLASS and OUTPUT */
proc means data=work.sales n mean std min max p25 median p75;
    class region product_category;
    var revenue units;
    output out=work.sales_stats
        n= mean= std= min= max=
        / autoname;
run;

-- Snowflake SQL equivalent with full descriptive statistics
CREATE OR REPLACE TABLE analytics.sales_stats AS
SELECT
    region,
    product_category,
    COUNT(revenue) AS revenue_n,
    AVG(revenue) AS revenue_mean,
    STDDEV(revenue) AS revenue_std,
    MIN(revenue) AS revenue_min,
    MAX(revenue) AS revenue_max,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) AS revenue_p25,
    MEDIAN(revenue) AS revenue_median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) AS revenue_p75,
    COUNT(units) AS units_n,
    AVG(units) AS units_mean,
    STDDEV(units) AS units_std,
    MIN(units) AS units_min,
    MAX(units) AS units_max,
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY units) AS units_p25,
    MEDIAN(units) AS units_median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY units) AS units_p75
FROM staging.sales
GROUP BY region, product_category
ORDER BY region, product_category;

PROC FREQ with Cross-Tabulation

/* SAS PROC FREQ cross-tab */
proc freq data=work.customers;
    tables region * loyalty_tier / norow nocol nopercent;
run;

-- Snowflake equivalent using PIVOT for cross-tabulation
SELECT *
FROM (
    SELECT region, loyalty_tier, customer_id
    FROM staging.customers
)
PIVOT (
    COUNT(customer_id)
    FOR loyalty_tier IN ('Gold', 'Silver', 'Bronze', 'Standard')
)
ORDER BY region;

LIBNAME References to Snowflake Database.Schema

SAS LIBNAME statements establish connections to data locations. In a typical SAS environment, multiple libnames point to different databases, file systems, or SAS data libraries. In Snowflake, the equivalent is the three-level naming convention: database.schema.table. Understanding this mapping is essential because libname references appear in virtually every SAS program.

SAS LIBNAME Pattern	Snowflake Equivalent	Example
`libname raw '/data/raw';`	Database + Schema	`raw_db.landing.table_name`
`libname dwh oracle ...;`	Direct table reference	`warehouse_db.curated.table_name`
`libname work;` (temporary)	Session-scoped temporary tables	`CREATE TEMPORARY TABLE ...`
`libname xl xlsx '...';`	Stage + file format + COPY INTO	Load from staged Excel/CSV files
`libname api json ...;`	External function / VARIANT	Parse JSON via PARSE_JSON()

MigryX precision parser — Deep AST-level analysis ensures every construct is understood before conversion begins

Platform-Specific Optimization by MigryX

MigryX maintains deep knowledge of every target platform’s strengths and best practices. When converting to Snowflake, it leverages Snowpark and native SQL functions. When targeting Databricks, it uses PySpark DataFrame operations optimized for distributed execution. When generating dbt models, it follows dbt best practices for modularity and testability. This platform awareness is what makes MigryX output production-ready from day one.

SAS Data Ingestion to Snowflake Stages and Snowpipe

SAS programs frequently use INFILE/INPUT statements to read flat files, or LIBNAME engines to connect to databases. In Snowflake, data ingestion is handled through Stages (internal or external), COPY INTO for batch loading, and Snowpipe for continuous real-time ingestion. Storage Integrations provide secure, credential-free access to cloud storage buckets.

/* SAS: Read CSV with custom delimiter and formatting */
data work.claims;
    infile '/data/incoming/claims_20250401.csv'
        dlm='|' firstobs=2 missover truncover;
    input
        claim_id $12.
        patient_id $10.
        service_date yymmdd10.
        diagnosis_code $8.
        billed_amount 12.2
        paid_amount 12.2;
    format service_date date9.;
run;

-- Snowflake equivalent: Stage + File Format + COPY INTO
CREATE OR REPLACE FILE FORMAT claims_csv_format
    TYPE = 'CSV'
    FIELD_DELIMITER = '|'
    SKIP_HEADER = 1
    NULL_IF = ('')
    FIELD_OPTIONALLY_ENCLOSED_BY = '"'
    ERROR_ON_COLUMN_COUNT_MISMATCH = FALSE;

-- External stage with Storage Integration (no credentials in SQL)
CREATE OR REPLACE STAGE claims_stage
    URL = 's3://company-data/incoming/'
    STORAGE_INTEGRATION = s3_prod_integration
    FILE_FORMAT = claims_csv_format;

-- Batch load
COPY INTO staging.claims (
    claim_id, patient_id, service_date,
    diagnosis_code, billed_amount, paid_amount
)
FROM @claims_stage/claims_20250401.csv
FILE_FORMAT = claims_csv_format
ON_ERROR = 'CONTINUE';

-- Or continuous auto-ingestion via Snowpipe
CREATE OR REPLACE PIPE staging.claims_pipe
    AUTO_INGEST = TRUE
AS
    COPY INTO staging.claims
    FROM @claims_stage
    FILE_FORMAT = claims_csv_format;

SAS Scheduling to Snowflake Tasks and Task DAGs

SAS programs are typically scheduled through operating system cron jobs, LSF (Load Sharing Facility), SAS Management Console, or Control-M. These external schedulers launch SAS sessions that execute programs in sequence. Snowflake Tasks provide native CRON-based scheduling with DAG dependencies, eliminating external scheduler dependencies and providing built-in monitoring, retry logic, and error handling.

-- Replace a SAS nightly batch schedule with Snowflake Task DAG

-- Root task: Load new data from stage
CREATE OR REPLACE TASK etl.load_daily_data
    WAREHOUSE = etl_wh
    SCHEDULE = 'USING CRON 0 2 * * * America/New_York'
AS
    CALL etl.load_from_stage('claims_stage', 'staging.claims');

-- Child task: Transform and enrich (runs after load completes)
CREATE OR REPLACE TASK etl.transform_claims
    WAREHOUSE = etl_wh
    AFTER etl.load_daily_data
AS
    CALL analytics.process_monthly(
        'staging.claims', CURRENT_DATE() - 30, CURRENT_DATE(), 'analytics'
    );

-- Child task: Build summary tables for dashboards
CREATE OR REPLACE TASK etl.build_summaries
    WAREHOUSE = etl_wh
    AFTER etl.transform_claims
AS
    CREATE OR REPLACE TABLE gold.claims_dashboard AS
    SELECT
        DATE_TRUNC('WEEK', service_date) AS week_start,
        diagnosis_code,
        COUNT(*) AS claim_count,
        SUM(billed_amount) AS total_billed,
        SUM(paid_amount) AS total_paid,
        AVG(paid_amount / NULLIF(billed_amount, 0)) AS avg_payment_ratio
    FROM analytics.enriched_claims
    GROUP BY 1, 2;

-- Enable the task tree (must resume bottom-up)
ALTER TASK etl.build_summaries RESUME;
ALTER TASK etl.transform_claims RESUME;
ALTER TASK etl.load_daily_data RESUME;

Dynamic Tables: Declarative SAS Replacements

Many SAS programs exist solely to refresh a derived table on a schedule — read source data, apply transformations, and overwrite the target. These programs often span hundreds of lines of DATA step and PROC SQL code. Snowflake Dynamic Tables can replace these entire programs with a single declarative SQL definition that Snowflake automatically keeps fresh based on a configurable target lag.

-- Replace a 200-line SAS program that runs nightly to rebuild
-- a customer 360 view from multiple source tables

CREATE OR REPLACE DYNAMIC TABLE gold.customer_360
    TARGET_LAG = '1 hour'
    WAREHOUSE = analytics_wh
AS
    SELECT
        c.customer_id,
        c.customer_name,
        c.segment,
        c.region,
        o.total_orders,
        o.total_revenue,
        o.avg_order_value,
        o.first_order_date,
        o.last_order_date,
        DATEDIFF('day', o.last_order_date, CURRENT_DATE()) AS days_since_last_order,
        CASE
            WHEN DATEDIFF('day', o.last_order_date, CURRENT_DATE()) <= 30 THEN 'Active'
            WHEN DATEDIFF('day', o.last_order_date, CURRENT_DATE()) <= 90 THEN 'At Risk'
            WHEN DATEDIFF('day', o.last_order_date, CURRENT_DATE()) <= 365 THEN 'Dormant'
            ELSE 'Churned'
        END AS lifecycle_stage
    FROM staging.customers c
    LEFT JOIN (
        SELECT
            customer_id,
            COUNT(*) AS total_orders,
            SUM(order_amount) AS total_revenue,
            AVG(order_amount) AS avg_order_value,
            MIN(order_date) AS first_order_date,
            MAX(order_date) AS last_order_date
        FROM staging.orders
        GROUP BY customer_id
    ) o ON c.customer_id = o.customer_id;

Dynamic Tables are one of the most powerful simplifications available in a SAS-to-Snowflake migration. A SAS program that runs nightly with 200 lines of DATA step and PROC SQL code can often be replaced with a single Dynamic Table definition. Snowflake handles all refresh scheduling, incremental processing, and dependency management automatically. MigryX identifies candidate SAS programs that can be converted to Dynamic Tables during its analysis phase.

Semi-Structured Data: SAS vs. Snowflake

SAS handles JSON and XML through specialized libname engines or DATA step parsing with limited native support. Snowflake treats semi-structured data as a first-class citizen through the VARIANT data type, dot-notation traversal, PARSE_JSON(), and FLATTEN() for array expansion. This is a significant capability upgrade when migrating from SAS, where JSON processing is cumbersome and XML handling requires the SAS XML Mapper.

/* SAS: Parse JSON response (requires SAS JSON libname engine) */
libname api json '/data/api/customers.json';

data work.parsed_customers;
    set api.root;
    customer_id = ordinal_root;
    name = name;
    email = email;
run;

libname api clear;

-- Snowflake: Native semi-structured data handling
-- Load raw JSON into VARIANT column
CREATE OR REPLACE TABLE staging.api_raw (payload VARIANT);

COPY INTO staging.api_raw
FROM @api_stage/customers.json
FILE_FORMAT = (TYPE = 'JSON');

-- Parse and flatten with full type safety
SELECT
    c.value:id::INTEGER AS customer_id,
    c.value:name::STRING AS name,
    c.value:email::STRING AS email,
    c.value:address.city::STRING AS city,
    c.value:address.state::STRING AS state,
    t.value::STRING AS tag
FROM staging.api_raw r,
    LATERAL FLATTEN(input => r.payload:customers) c,
    LATERAL FLATTEN(input => c.value:tags, outer => true) t;

Time Travel and Data Recovery

SAS has no built-in mechanism for data versioning or recovery. If a SAS program overwrites a dataset incorrectly, the previous version is lost unless manual backups were taken. Snowflake's Time Travel feature provides automatic data versioning for up to 90 days, enabling recovery from accidental overwrites, deletions, or incorrect transformations.

-- Recover data after an incorrect SAS-to-Snowflake migration run
-- Query the table as it existed 2 hours ago
SELECT * FROM analytics.customer_360 AT (OFFSET => -7200);

-- Restore the table to its previous state
CREATE OR REPLACE TABLE analytics.customer_360 AS
SELECT * FROM analytics.customer_360 AT (OFFSET => -7200);

-- Query before a specific statement executed
SELECT * FROM analytics.customer_360 BEFORE (STATEMENT => '01abc123-0000-0000-0000-000000000001');

How MigryX Automates SAS to Snowflake Migration

Manual SAS-to-Snowflake migration is extraordinarily labor-intensive. A typical enterprise SAS estate contains thousands of programs with complex interdependencies, custom format catalogs, macro libraries, and undocumented business logic embedded in DATA steps. MigryX's platform automates this conversion through several key capabilities that differentiate it from manual migration or simple regex-based tools.

AST-Based Deterministic Parsing. MigryX parses SAS programs into a complete Abstract Syntax Tree, capturing every DATA step, PROC, macro definition, macro invocation, format reference, and libname assignment. Unlike regex-based tools that rely on pattern matching (and break on non-standard code), or AI-only approaches that hallucinate transformations, MigryX's parser achieves +95% accuracy by understanding the full syntactic and semantic structure of SAS programs. The parser handles SAS-specific constructs like implicit DATA step loops, RETAIN semantics, first./last. processing, ARRAY statements, hash objects, and nested macro expansions that span multiple source files.

Column-Level Lineage. MigryX traces data flow from source columns through every transformation to target columns, across DATA steps, PROC SQL queries, macro expansions, and cross-program references. This lineage is critical for validating that the Snowflake output produces the same results as the SAS original, and for satisfying regulatory requirements in industries like banking, insurance, and pharmaceuticals where data provenance must be documented.

Multi-Target Output. From a single SAS program, MigryX can generate Snowflake SQL, Snowpark Python, or a combination based on which target best fits the transformation pattern. Simple PROC SQL maps to Snowflake SQL. Complex DATA steps with RETAIN and first./last. logic map to Snowpark Python with window functions. Macros with dynamic SQL generation map to Snowflake stored procedures. This intelligent target selection ensures that each SAS construct is converted to the most maintainable Snowflake equivalent.

STTM Documentation. For every conversion, MigryX generates comprehensive Source-to-Target Mapping documentation that maps each SAS construct to its Snowflake equivalent, including data type mappings, format conversions, function substitutions, and semantic equivalence notes. This documentation supports validation testing, UAT sign-off, and regulatory audit trails.

On-Premise and Air-Gapped Deployment. Many SAS installations exist in regulated environments with strict data residency requirements — financial institutions, government agencies, healthcare organizations, and defense contractors. MigryX deploys on-premise or in fully air-gapped environments, processing SAS code without sending any data or source code to external services.

Merlin AI Assistant. MigryX's Merlin AI assistant provides interactive guidance during migration planning and execution. Merlin analyzes your SAS codebase to identify complexity hotspots, estimates conversion effort by program, flags SAS constructs that require special handling (like SAS/STAT procedures, SAS/GRAPH output, or ODS destinations), and answers questions about specific conversion patterns. Merlin augments MigryX's deterministic parser with AI-powered explanations and recommendations, combining the reliability of AST-based conversion with the flexibility of AI-assisted guidance.

Key Takeaways
SAS DATA steps map to Snowpark Python DataFrames for row-level logic, or to SQL with window functions for RETAIN/first./last. patterns. MigryX's AST parser automatically selects the optimal target for each DATA step pattern.
PROC SQL maps nearly 1:1 to Snowflake SQL, with specific conversions needed for the calculated keyword, SAS date literals, SELECT INTO macro variable creation, and MONOTONIC().
SAS Macros map to Snowflake stored procedures (Snowflake Scripting or Snowpark Python), with macro variables becoming procedure arguments or session variables.
SAS PROC FORMAT custom formats map to CASE expressions or lookup table JOINs in Snowflake. MigryX automatically resolves PUT() calls against format definitions.
Statistical procedures (PROC MEANS, PROC FREQ, PROC SUMMARY) map to SQL aggregate functions, window functions, and PIVOT/UNPIVOT syntax.
SAS scheduling (cron, LSF, SAS Management Console) is replaced by Snowflake Tasks with CRON scheduling and predecessor-based DAG dependencies.
Dynamic Tables can replace entire SAS programs that exist solely to refresh derived datasets, eliminating both transformation code and scheduling infrastructure.
Snowflake Time Travel provides automatic data versioning and recovery — a capability that does not exist in SAS.
MigryX provides +95% parser accuracy, column-level lineage, STTM documentation, multi-target output, and on-premise deployment — addressing the specific requirements of regulated SAS migration projects.

Migrating from SAS to Snowflake is not merely a language translation exercise. It represents a fundamental architectural shift from a single-threaded, file-based processing model to a massively parallel, cloud-native data platform. The reward is dramatic: elastic scalability, consumption-based pricing, native semi-structured data support, automatic Time Travel for data recovery, and an open ecosystem of tools and integrations. For organizations with decades of SAS investment, a systematic, tool-assisted migration using MigryX ensures that business logic is preserved, column-level lineage is documented, and the transition to Snowflake is both accurate and auditable.

Why MigryX Delivers Superior Migration Results

The challenges described throughout this article are exactly what MigryX was built to solve. Here is how MigryX transforms this process:

Production-ready output: MigryX generates code that passes code review and runs in production — not prototype-quality output that needs weeks of cleanup.
Platform optimization: Converted code leverages target platform-specific features for maximum performance and cost efficiency.
25+ source technologies: Whether migrating from SAS, Informatica, DataStage, SSIS, or any of 25+ legacy technologies, MigryX handles it.
Automated documentation: Every conversion decision is documented with before/after code mappings and transformation rationale.

MigryX combines precision AST parsing with Merlin AI to deliver 99% accurate, production-ready migration — turning what used to be a multi-year manual effort into a streamlined, validated process. See it in action.

Ready to migrate from SAS to Snowflake?

See how MigryX converts SAS DATA steps, PROC SQL, and macros to production-ready Snowpark Python and Snowflake SQL with column-level lineage and STTM documentation.

Explore Snowflake Migration Schedule a Demo