GES-C01 SnowPro Specialty Gen AI Certification PDF Questions

Outros

Karon Chen

in 12/8/2025

Material

Study with thousands of resources!

Text Material Preview

1 / 23
Snowflake GES-C01 Exam
SnowPro® Specialty: Gen AI Certification
Exam
https://www.passquestion.com/ges-c01.html
35% OFF on All, Including GES-C01 Questions and Answers
Pass GES-C01 Examwith PassQuestion GES-C01 questions and
answers in the first attempt.
https://www.passquestion.com/
https://www.passquestion.com/
2 / 23
1.A data application developer is tasked with building a multi-turn conversational AI application using
Streamlit in Snowflake (SiS) that leverages the COMPLETE (SNOWFLAKE. CORTEX) LLM function.
To ensure the conversation flows naturally and the LLM maintains context from previous interactions,
which of the following is the most appropriate method for handling and passing the conversation history?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: C
2.A Streamlit application developer wants to use AI_COMPLETE (the latest version of COMPLETE
(SNOWFLAKE. CORTEX)) to process customer feedback. The goal is to extract structured information,
such as the customer's sentiment, product mentioned, and any specific issues, into a predictable JSON
format for immediate database ingestion.
Which configuration of the AI_COMPLETE function call is essential for achieving this structured output
requirement?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: C
Explanation:
'AI_COMPLETE Structured Outputs' (and its predecessor ‘COMPLETE Structured Outputs’) specifically
allows supplying a JSON schema as the 'response_format’ argument to ensure completion responses
follow a predefined structure. This significantly reduces the need for post-processing in AI data pipelines
and enables seamless integration with systems requiring deterministic responses. The JSON schema
object defines the structure, data types, and constraints, including required fields. While prompting the
model to 'Respond in JSON' can improve accuracy for complex tasks, the ‘response_format’ argument is
the direct mechanism for enforcing the schema. Setting ‘temperature’ to 0 provides more consistent
3 / 23
results for structured output tasks.
Option A is a form of prompt engineering, which can help but does not guarantee strict adherence as
response_format’ does.
Option B controls randomness and length, not output structure.
Option D is less efficient for extracting multiple related fields compared to a single structured output call.
Option E's ‘guardrails' are for filtering unsafe or harmful content, not for enforcing output format.
3.A Snowflake developer, AI_ENGINEER, is creating a Streamlit in Snowflake (SiS) application that will
utilize a range of Snowflake Cortex LLM functions, including
SNOwFLAKE.CORTEX.COMPLETE,SNOwFLAKE.CORTEX.CLASSIFY_TEXT, and
SNOwFLAKE.CORTEX.EMBED_TEXT_768. The application also needs to access data from tables
within a specific database and schema. AI_ENGINEER has created a custom role, app_dev_role, for the
application to operate under.
Which of the following privileges or roles are absolutely necessary to grant to app_dev_role for the
successful execution of these Cortex LLM functions and interaction with the specified database objects?
(Select all that apply.)
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: A,C
Explanation:
To execute Snowflake cortex AI functions such as 'SNOWFLAKE.CORTEX.COMPLETE,
‘SNOWFLAKE.CORTEX.CLASSIFY_TEXT, and ‘EMBED_TEXT_768' (or their SAE prefixed
counterparts), the role used by the application in this case) must be granted the
'SNOWFLAKE.CORTEX_USER database role. Additionally, for the Streamlit application to access any
database or schema objects (like tables for data input/output, or for the Streamlit app itself if it is stored as
a database object), the USAGE privilege must be granted on those specific database and schema
objects.
Option B, 'CREATE SNOWFLAKE.ML.DOCUMENT_INTELLIGENCE, is a privilege specific to creating
Document AI model builds and is not required for general Cortex LLM functions.
Option D, ‘ACCOUNTADMIN’, grants excessive privileges and is not a best practice for application roles.
Option E, 'CREATE COMPUTE POOL', is a privilege related to Snowpark Container Services for creating
compute pools, which is not directly required for running a Streamlit in Snowflake application that
consumes Cortex LLM functions.
4.A data application developer is tasked with building a multi-turn conversational AI application using
Streamlit in Snowflake (SiS) that leverages the COMPLETE (SNOWFLAKE. CORTEX) LLM function. To
4 / 23
ensure the conversation flows naturally and the LLM maintains context from previous interactions, which
of the following is the most appropriate method for handling and passing the conversation history?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: C
Explanation:
To provide a stateful, conversational experience with the 'COMPLETE (SNOWFLAKE.CORTEX)' function
(or its latest version, 'AI_COMPLETE'), all previous user prompts and model responses must be explicitly
passed as part of the argument. This argument expects an array of objects, where each object represents
a turn and contains a 'role' ('system', 'user', or 'assistant') and a ‘content key, presented in chronological
order. In Streamlit, ‘st.session_state’ is the standard and recommended mechanism for storing and
managing data across reruns of the application, making it ideal for maintaining chat history, by initializing
'st.session_state.messages = [l' and appending messages to it.
Option A is incorrect because 'COMPLETE does not inherently manage history from external tables.
Option B is incorrect as ‘COMPLETE does not retain state between calls; history must be explicitly
managed.
Option D is a less effective form of prompt engineering compared to passing structured history, as it loses
the semantic role distinction and can be less accurate for LLMs.
Option E describes a non- existent parameter for the 'COMPLETE function.
5.A Streamlit application developer wants to use AI_COMPLETE (the latest version of COMPLETE
(SNOWFLAKE.CORTEX)) to process customer feedback. The goal is to extract structured information,
such as the customer's sentiment, product mentioned, and any specific issues, into a predictable JSON
format for immediate database ingestion.
Which configuration of the AI_COMPLETE function call is essential for achieving this structured output
requirement?
A. Option A
B. Option B
5 / 23
C. Option C
D. Option D
E. Option E
Answer: C
Explanation:
'AI_COMPLETE Structured OutputS (and its predecessor ‘COMPLETE Structured OutputS) specifically
allows supplying a JSON schema as the ‘response_format' argument to ensure completion responses
follow a predefined structure. This significantly reduces the need for post-processing in AI data pipelines
and enables seamless integration with systems requiring deterministic responses. The JSON schema
object defines the structure, data types, and constraints, including required fields. For complex tasks,
prompting the model to respond in JSON can improve accuracy, but the ‘response_format’ argument is
the direct mechanism for enforcing the schema. Setting ‘temperature to 0 provides more consistent
results for structured output tasks.
Option A is a form of prompt engineering, which can help but does not guarantee strict adherence as
‘response_format does.
Option B controls randomness and length, not output structure.
Option D, while 'AI_EXTRACT (or EXTRACT ANSWER) can extract information, using it multiple times
and then manually combining results is less efficient and less robust than a single 'AI_COMPLETE call
with a structured output schema for multiple related fields.
Option E's ‘guardrails' are for filtering unsafe or harmful content, not for enforcing output format.
6.
A.
B.
C. The USAGE privilege on the specific database and schema where the Streamlit application and its
underlying data tables are located.
D. The ACCOUNTADMIN role to ensure unrestricted access to all Snowflake Cortex features.
E. The CREATE COMPUTE POOL privilegeto provision resources for the Streamlit application.
Answer: A,C
Explanation:
To execute Snowflake Cortex AI functions such as 'SNOWFLAKE.CORTEX.COMPLETE, 'CLASSIFY
TEXT (SNOWFLAKE.CORTEX)', and (or their prefixed counterparts like 'AI_COMPLETE', 'AI_CLASSIFY,
'AI_EMBED), the role used by the application in this case) must be granted the database role. This role
includes the privileges to call these functions. Additionally, for the Streamlit application to access any
database or schema objects (like tables for data input/output, or for the Streamlit app itself if it is stored as
a database object), the 'USAGE’ privilege must be granted on those specific database and schema
objects.
Option B, 'CREATE SNOWFLAKE.ML.DOCUMENT_INTELLIGENCE, is a privilege specific to creating
6 / 23
DocumentAl model builds and is not required for general Cortex LLM functions.
Option D, 'ACCOUNTADMIN", grants excessive privileges and is not a best practice for application roles.
Option E, 'CREATE COMPUTE POOL', is a privilege related to Snowpark Container Services for creating
compute pools, which is generally not directly required for running a Streamlit in Snowflake application
that consumes Cortex LLM functions via SQL, unless the LLMs themselves were deployed as services on
compute pools using Model Serving in Snowpark Container Services, which is not explicitly stated as the
method of LLM usage here.
7.A data engineer is building a Snowflake data pipeline to ingest customer reviews from a raw staging
table into a processed table. For each review, they need to determine the overall sentiment (positive,
neutral, negative) and store this as a distinct column. The pipeline is implemented using SQL with
streams and tasks to process new data.
Which Snowflake Cortex LLM function, when integrated into the SQL task, is best suited for this sentiment
classification and ensures a structured, single-label output for each review?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: B
Explanation:
To classify text into predefined categories, the function (or its updated version, is purpose-built and
directly returns the classification label. This approach is more direct and efficient than using
'SENTIMENT()' which returns a score, which extracts an answer to a question, or multiple calls which
return Boolean values. While could be prompted for classification, is a more specific task-specific function
designed for this exact use case within Cortex LLM functions.
8.A financial services company is developing an automated data pipeline in Snowflake to process Federal
Reserve Meeting Minutes, which are initially loaded as PDF documents. The pipeline needs to extract
specific entities like the FED's stance on interest rates ('hawkish', 'dovish', or 'neutral') and the reasoning
behind it, storing these as structured JSON objects within a Snowflake table. The goal is to ensure the
output is always a valid JSON object with predefined keys.
Which AI_COMPLETE configuration, used within an in-line SQL statement in a task, is most effective for
achieving this structured extraction directly in the pipeline?
7 / 23
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: C
Explanation:
To ensure that LLM responses adhere to a predefined JSON structure, the 'AI_COMPLETE function's
‘response_format’ argument, which accepts a JSON schema, is the most effective and direct method.
This mechanism enforces the structure, data types, and required fields, significantly reducing the need for
post-processing and ensuring deterministic, high-quality output. The AI-Infused Data Pipelines with
Snowflake Cortex blog highlights asking the LLM to create a JSON object for maximizing utility. While
setting ‘temperature' to 0 can improve consistency, it does not enforce a specific schema. Prompt
engineering (Option A) can help but does not guarantee strict adherence. Using multiple extraction calls
(Option D) is less efficient and robust for extracting multiple related fields than a single 'AI_COMPLETE
call with a structured output schema. Snowflake Cortex does not automatically infer and enforce a JSON
schema without explicit configuration (Option E).
9.A data engineering team is building a pipeline in Snowflake that uses a SQL task to call various
Snowflake Cortex LLM functions (e.g., AI_COMPLETE, AI EMBED) on large datasets of customer
interaction logs. The team observes fluctuating costs and occasional query failures, which sometimes halt
the pipeline. To address these issues and ensure an efficient, robust, and monitorable pipeline, which of
the following actions or considerations are essential? (Select all that apply.)
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: A,B,E
8 / 23
Explanation:
A. "Correct." The 'TRY function is designed to perform the same operation as but returns ‘NULL ‘ instead
of raising an error when the LLM operation cannot be performed. This is critical for building robust data
pipelines, as it prevents pipeline halts due to transient or specific LLM failures, allowing for more resilient
data processing. B. ‘ The view provides detailed information on token consumption and credit usage for
Snowflake Cortex LLM functions. Monitoring this view is essential for understanding cost drivers and
optimizing expenditure within AI pipelines. C. "Incorrect." Snowflake recommends executing queries that
call Cortex AISQL functions with a smaller warehouse (no larger than MEDIUM), as larger warehouses do
not necessarily increase performance but can lead to unnecessary costs. The LLM inference itself runs
on Snowflake-managed compute, not solely on the user's virtual warehouse compute size. D. ‘ Setting the
'temperature' parameter to 0 makes the LLM's output more deterministic and focused. While this can be
beneficial for consistency in certain tasks, it does not directly minimize token usage. Token usage is
primarily determined by the length of the input prompt and the length of the generated output, which can
vary regardless of ‘temperature'. E. "Correct." Encapsulating complex and potentially lengthy prompt logic
within a UDF CUSER DEFINED FUNCTION') makes the prompts more manageable, reusable, and
easier to integrate programmatically into SQL statements within a data pipeline. This improves code
organization and maintainability.
10.A data engineering team is setting up an automated pipeline in Snowflake to process call center
transcripts. These transcripts, once loaded into a raw table, need to be enriched by extracting specific
entities like the customer's name, the primary issue reported, and the proposed resolution. The extracted
data must be stored in a structured JSON format in a processed table. The pipeline leverages a SQL task
that processes new records from a stream.
Which of the following SQL snippets and approaches, utilizing Snowflake Cortex LLM functions, would
most effectively extract this information and guarantee a structured JSON output for each transcript?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: C
Explanation:
To guarantee a structured JSON output for entity extraction, (the updated version of 'COMPLETE()') with
the response_format’ argument and a specified JSON schema is the most effective approach. This
mechanism enforces that the LLM's output strictly conforms to the predefined structure, including data
types and required fields, significantly reducing the need for post-processing and improving data quality
within the pipeline.
9 / 23
Option A requires multiple calls and manual JSON assembly, which is less efficient.
Option B relies on the LLM's 'natural ability' to generate JSON, which might not be consistently structured
without explicit ‘response_format’.
Option D uses, which is for generating summaries, not structured entity extraction.
Option E involves external LLM API calls and Python UDFs, which, while possible, is less direct than
using native 'AI_COMPLETE structured outputs within a SQL pipeline in Snowflake Cortex forthis specific
goal.
11.A data team has implemented a Snowflake data pipeline using SQL tasks that process customer call
transcripts daily. This pipeline relies heavily on SNOWFLAKE. CORTEX. COMPLETE() (or its updated
alias) for various text analysis tasks, such as sentiment analysis and summary generation. Over time,
they observe that the pipeline occasionally fails due to LLM-related errors, and the compute costs are
higher than anticipated.
What actions should the team take to improve the robustness and cost-efficiency of this data pipeline?
(Select all that apply.)
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: A,C,D
Explanation:
A. ‘'Correct." performs the same operation as 'COMPLETE(Y but returns ‘NULC on failure instead of
raising an error when the LLM operation cannot be performed. This is critical for building robust data
pipelines, as it prevents pipeline halts due to transient or specific LLM failures. B. '‘Incorrect." Snowflake
recommends executing queries that call Cortex AISQL functions with a smaller warehouse (no larger than
MEDIUM). Larger warehouses do not necessarily increase performance but can lead to unnecessary
costs, as the LLM inference itself runs on Snowflake-managed compute, not solely on the user's virtual
warehouse compute size. C. "Correct." The 'SNOWFLAKE.ACCOUNT USAGE.CORTEX_FUNCTIONS
USAGE HISTORY view provides detailed information on token consumption and credit usage for
Snowflake Cortex LLM functions. Monitoring this view is essential for understanding cost drivers and
optimizing expenditure within AI pipelines. D. ‘ ‘Correct." Snowflake Cortex AI functions incur compute
costs based on the number of tokens processed (both input and output). Optimizing prompt engineering
to be concise and effective directly contributes to reducing the number of tokens consumed and, therefore,
the associated costs. E. "Incorrect." Setting the ‘temperature' parameter to 1.0 makes the LLM's output
more diverse and random. While useful for creativity, it does not guarantee a reduction in token usage or
a lower error rate. For the most consistent results, setting ‘temperature' to 0 is generally recommended.
10 / 23
12.A financial institution wants to develop a Snowflake-based pipeline to process call transcripts from
their customer support. The pipeline needs to perform two main tasks: first, ‘‘summarize very lengthy
technical support calls’’ (up to 20,000 tokens per transcript) into concise actionable insights, and second,
‘‘classify the sentiment’’ of these calls as 'positive', 'neutral', or 'negative'. Given these requirements for
integration into SQL data pipelines, which combination of Snowflake Cortex functions and prompt
engineering considerations would be most appropriate?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: B
Explanation:
For summarizing very lengthy technical support calls (up to 20,000 tokens), a model with a sufficiently
large context window is essential. (the updated version of offers flexibility for detailed summarization with
prompt engineering. A model like ‘mistral-large? has a context window of 128,000 tokens, making it
suitable for such long inputs. Encapsulating complex prompt logic within a SQL User Defined Function
(UDF) is a recommended practice for better management and reusability in data pipelines. For classifying
sentiment into predefined categories ('positive', 'neutral', 'negative'), (the updated version of is
purpose-built and directly returns the classification label. A. is a generic summarization function, but
'AI_COMPLETE with a large model provides more control for 'actionable insights'. returns a numerical
score, requiring additional logic for categorical output. C. 'SNOWFLAKE.CORTEX.EXTRACT ANSWER()'
is designed to extract specific answers to questions, not to summarize text. Using it multiple times for
summarization would be inefficient and less effective. While can perform classification, is the specialized
function for this task. D. ‘gemma-7b’ has a context window of 8,000 tokens, which is insufficient for
processing calls up to 20,000 tokens, potentially leading to truncation or incomplete results. E. and
SUMMARIZE AGG()' are designed to aggregate insights or summaries ‘across multiple rows’ or groups of
text, not to summarize a single, lengthy document. returns a boolean result, making it less suitable for
multi-category classification directly.
13.A data engineering team is designing a Snowflake data pipeline to automatically enrich a 'customer
issues’ table with product names extracted from raw text-based 'issue_description' columns. They want to
use a Snowflake Cortex function for this extraction and integrate it into a stream and task-based pipeline.
Given the 'customer_issues' table with an 'issue_id' and (VARCHAR), which of the following SQL snippets
correctly demonstrates the use of a Snowflake Cortex function for this data enrichment within a task,
11 / 23
assuming is a stream on the 'customer issues' table?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: B
Explanation:
Option B correctly uses to pull specific information (product name) from unstructured text, which is a
common data enrichment task. It also integrates with a stream ('issue_stream') by filtering for
'METADATA$ACTION = 'INSERT" and uses a 'MERGE statement, which is suitable for incremental
updates in a data pipeline by inserting new extracted data based on new records in the stream.
Option A uses for generating a response, not for specific entity extraction, and its prompt is less precise
for this task than 'EXTRACT_ANSWER.
Option C uses 'SNOWFLAKE.CORTEX.CLASSIFY_TEXT for classification, not direct entity extraction of
a product name, and attempts to update the source table directly, which is not ideal for adding new
columns based on stream data.
Option D proposes a stored procedure and task, which is a valid pipeline structure. However, the
EXTRACT ANSWER call within the procedure only returns a result set and does not demonstrate the final
insertion or merging step required to persist the extracted data into an ‘enriched_issues’ table.
Option E uses to generate vector embeddings, which is a form of data enrichment, but the scenario
specifically asks for 'product names' (a string value), not embeddings for similarity search.
14.A retail company wants to implement an automated data pipeline in Snowflake to analyze daily
customer reviews. The goal is to enrich a 'product_reviews_sentiment' table with sentiment categories
(e.g., 'positive', 'neutral', 'negative') for each new review. They require the sentiment to be returned as a
JSON object for downstream processing and need the pipeline to handle potential LLM errors gracefully
without stopping. Assuming a stream 'new reviews_stream' monitors a 'customer _ reviews’ table, which
approach effectively uses a Snowflake Cortex function for this scenario?
12 / 23
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: C
Explanation:
Option C is the most effective approach for this scenario. It correctly uses "SNOWFLAKE.CORTEX.TRY
COMPLETE", which performs the same operation as 'COMPLETE but returns NULL instead of raising an
error when the operation cannot be performed, making the pipeline more robust to LLM issues. The
‘response_format option ensures the output adheres to a specified JSON schema for structured
sentiment categories, meeting the requirement for structured output. This is integrated within a 'MERGE
statement in a task for incremental processing of new data from Option A suggests a Python UDF with
'COMPLETE. While feasible, 'TRY_COMPLETE is explicitly designed for graceful error handling in
pipelines, which 'COMPLETE lacks by default.
Option B uses ‘SNOWFLAKE.CORTEX.SENTIMENT, which returns a numeric score (e.g., 0.5424458),
not a categorical JSON object, requiring additional post-processing logic for categorization.
Option D uses for summarization and ‘AI CLASSIFY for classification. While 'AI_CLASSIFY can
categorize,the request is for sentiment of ‘each’ review, and 'AI_AGG' would aggregate before classifying,
not fulfilling the individual review sentiment requirement.
Option E suggests a dynamic table, but dynamic tables currently do not support incremental refresh with
'COMPLETE (or 'AI_COMPLETE) functions, making them unsuitable for continuous LLM-based
processing in this manner. Furthermore, 'COMPLETE does not offer the graceful error handling of 'TRY
COMPLETE'.
15.A data architect is integrating Snowflake Cortex LLM functions into various data enrichment pipelines.
To ensure optimal performance, cost-efficiency, and accuracy, which of the following are valid best
practices or considerations for these pipelines?
A. When extracting specific entities from documents using SAI EXTRACT or '!PREDICT, it is often more
effective to fine-tune a Document AI model for complex or varied document layouts rather than relying
solely on extensive prompt engineering for zero-shot extraction.
B. For tasks requiring deterministic JSON outputs, explicitly specifying a JSON schema using the
‘response_format' argument with 'AI COMPLETE is crucial, and for OpenAI (GPT) models, including the
‘required' field and setting ‘additionalPropertieS to 'false' in every node of the schema is a mandatory
requirement.
C. To manage costs effectively for LLM functions like SAI COMPLETE in a pipeline, always use the
13 / 23
largest available warehouse size (e.g., 6XL Snowpark- optimized) to maximize throughput, as this directly
reduces the overall token processing time and cost.
D. When performing sentiment analysis on customer feedback using 'AI_SENTIMENT, it's best practice to
pass detailed, multi-turn conversation history to the function to enhance accuracy, similar to how
'AI_COMPLETE handles conversational context.
E. For data enrichment involving classification with 'AI_CLASSIFY', using descriptive and mutually
exclusive categories in plain English, along with an optional clear task description, can significantly
improve classification accuracy.
Answer: A,B,E
Explanation:
Option A is correct. For extracting information from documents with complex or varied layouts, fine-tuning
a Document AI model can significantly improve results compared to relying solely on zero-shot extraction
and extensive prompt engineering. Document AI provides both zero-shot extraction and fine-tuning
capabilities, with fine-tuning recommended to improve results on specific document types.
Option B is correct. To ensure 'AI_COMPLETE (or 'COMPLETE) returns responses in a structured JSON
format, it is essential to specify a JSON schema using the ‘response_format’ argument. For OpenAl (GPT)
models, specific requirements include setting ‘additionalPropertieS to 'false' in every node and ensuring
the ‘required' field lists all property names.
Option C is incorrect. Snowflake explicitly recommends executing queries that call Cortex AISQL
functions (such as 'AI COMPLETES) using a smaller warehouse, no larger than MEDIUM. Using larger
warehouses does not increase performance for these functions but will incur unnecessary compute costs.
The LLM inference itself is managed by Snowflake, and its performance isn't directly scaled by
warehouse size in the same way as traditional SQL queries.
Option D is incorrect. 'AI_SENTIMENT (and 'SENTIMENT) is a task-specific function designed to return a
sentiment score for a given English-language text. Unlike 'AI_COMPLETE (or 'COMPLETE'), which
supports multi-turn conversations by passing conversation history for a stateful experience, SAI
SENTIMENT processes individual text inputs and is not designed to leverage multi-turn context in the
same way for sentiment analysis.
Option E is correct. For classification tasks using 'AI_CLASSIFY (or 'CLASSIFY TEXT), best practices
include using plain English for the input text and categories, ensuring categories are descriptive and
mutually exclusive, and adding a clear ‘task_description’ when the relationship between input and
categories is ambiguous. These guidelines significantly improve classification accuracy.
16.A Gen AI Specialist is tasked with implementing a data pipeline to automatically enrich new customer
feedback entries with sentiment scores using Snowflake Cortex functions. The new feedback arrives in a
staging table, and the enrichment process must be automated and cost-effective. Given the following
pipeline components, which combination of steps is most appropriate for setting up this continuous data
augmentation process?
14 / 23
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: C
Explanation:
Option C is the most direct and efficient approach for continuously augmenting data with sentiment scores
in a Snowflake pipeline. is a task-specific AI function designed for this purpose, returning an overall
sentiment score for English-language text. SNOWF LAKE .CORTEX.SENTIMENT Integrating it directly
into a task that monitors a stream allows for automated, incremental processing of new data as it arrives
in the stage. The source explicitly mentions using Cortex functions in data pipelines via the SQL interface.
Option A is plausible, but calling SENTIMENT directly in SQL within a task (Option C) is simpler and
avoids the overhead of a Python UDF if the function is directly available in SQL, which it is.
Option B, using a dynamic table, is not supported for Snowflake Cortex functions.
Option D, while powerful for custom LLMs, is an over-engineered solution and introduces more
complexity (SPCS setup, custom service) than necessary for a direct sentiment function.
Option E describes a manual, non- continuous process, which contradicts the requirement for an
automated pipeline.
17.A financial institution wants to automate the extraction of key entities (e.g., invoice number, total
amount, list of invoice items) from incoming PDF financial statements into a structured JSON format
within their Snowflake data pipeline. The extracted data must conform to a specified JSON schema for
seamless downstream integration.
Which Snowflake Cortex capabilities, when combined, can best achieve this data augmentation and
ensure schema adherence in a continuous processing pipeline?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: B,D
Explanation:
15 / 23
18.A data engineering team aims to automatically classify incoming customer support requests into
predefined categories ('Technical Issue', 'Billing Inquiry', 'General Question') as part of their Snowflake
data ingestion pipeline. The goal is to achieve high classification accuracy while managing LLM inference
costs efficiently.
Which of the following strategies, when applied within a Snowflake data pipeline using Streams and Tasks,
would best contribute to meeting these objectives?
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: A
Explanation:
16 / 23
19.
Which of the following SQL snippets, when executed against a single invoice file like "invoice001 .pdf",
correctly extracts and transforms the desired data, assuming 'json_content' holds the raw Document AI
output?
A)
B)
C)
D)
E)
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: B
Explanation:
Option B correctly uses a Common Table Expression (CTE) to retrieve the raw JSON output from (which
is a Document AI method for extracting information from documents in a stage), leveraging to access the
17 / 23
document. It then accesses the 'invoice_number' and 'vendor_name' using .value' syntax, appropriate for
values returned as an array containing a single object with a ‘value' field, as shown in Document AI output
examples. The 'LATERAL FLATTEN' clause is correctly applied to expand the array of line items, and
'ARRAY_AGG' combined with 'ARRAY _ TO STRING’ converts these items into a comma-separated
string. Finally, it groups by the single-value extracted fields.
Option A attempts to flatten the result multiple times or in an incorrect way within the SELECT statement
without a proper FROM' clause for theflattened data, leading to inefficient or incorrect aggregation.
Option C directly references a staged file path (@invoice_docs_stage/invoice001.pdf) without the
necessary GET PRESIGNED URL' function, which is required when calling '!PREDICT' with a file from a
stage. It also incorrectly assumes direct .value’ access for array-wrapped single values and does not
correctly transform the 'invoice_itemS array into a string.
Option D's subquery for 'ARRAYAGG' is syntactically problematic for direct column access from the outer
query without explicit 'LATERAL FLATTEN' at the top level.
Option E only extracts the ‘ocrScore’ from the document metadata and does not perform the requested
data transformations.
20.A data engineer is designing an automated pipeline to process customer feedback comments from a
'new_customer_reviews' table, which includes a 'review_text' column. The pipeline needs to classify each
comment into one of three predefined categories: 'positive', 'negative', or 'neutral', and store the
classification label in a new 'sentiment_label' column.
Which of the following statements correctly describe aspects of implementing this data transformation
using 'SNOWFLAKE.CORTEX.CLASSIFY_TEXT' in a Snowflake pipeline?
A. The classification can be achieved by integrating a 'SELECT statement with
into an 'INSERT or 'UPDATE task.
B. Including an optional 'task_description' such as
can improve the accuracy of classification, especially if the relationship between text and categories is
ambiguous.
C. The cost for ‘CLASSIFY _ TEXT is incurred based on the number of pages processed in the input
document.
D. The argument must contain exactly three unique categories for sentiment classification.
E. Both the input string to classify and the are case-sensitive, potentially yielding different results for
variations in capitalization.
Answer: A,B,E
Explanation:
Option A is correct. 'SNOWFLAKE.CORTEX.CLASSIFY_TEXT classifies free-form text into categories
and returns an ‘OBJECT’ value (VARIANT) where the 'label' field specifies the category. This can be
extracted using ‘[‘labeI’]’ and seamlessly integrated into 'INSERT or ‘UPDATE' statements within a
pipeline task for data transformation.
Option B is correct. Adding a clear ‘task_description’ to the ‘options' argument for ‘CLASSIFY_TEXT’ can
significantly improve classification accuracy. This is particularly useful when the relationship between the
input text and the provided categories is ambiguous or nuanced.
Option C is incorrect. incurs compute cost based on the number of tokens processed (both input and
18 / 23
output tokens), not on the number of pages in a document. Functions like 'AI_PARSE_DOCUMENT bill
based on pages.
Option D is incorrect. The argument for ‘CLASSIFY_TEXT’ must contain at least two and at most 100
unique categories. It is not strictly limited to three for any classification task, including sentiment.
Option E is correct. Both the 'input' string to classify and the are case-sensitive, meaning that differences
in capitalization for either the input text or the category labels can lead to different classification results.
21.A Snowflake developer is tasked with enhancing a daily data pipeline. The pipeline processes raw text
descriptions of system events and needs to extract structured information, specifically the (string) and its
(string, restricted to 'low', 'medium', 'high', 'critical'). The output must be a strictly formatted JSON object,
ensuring data quality for downstream analytics.
Consider the following SQL snippet intended for this transformation:
Which of the following statements are correct regarding this implementation and best practices for using
with structured outputs in a data pipeline?
A. The ‘response_format’ correctly defines the expected JSON structure, using ‘enum’ for ‘severity_lever
and ‘required' to ensure ‘event_name’ and severity_lever are always present if extracted.
B. Setting ‘temperature ‘to '0.7 ‘ is optimal for ensuring the most consistent and deterministic JSON
outputs, especially for complex extraction tasks.
C. Using 'TRY COMPLETE instead of would allow the pipeline to gracefully handle cases where the
model fails to generate a valid JSON response by returning 'NULL' instead of an error.
D. The complexity of the JSON schema, particularly deep nesting, does not impact the number of tokens
processed and billed for 'AI_COMPLETE Structured Outputs.
E. For all models supported by 'AI_COMPLETE' Structured Outputs, the ‘additionalPropertieS field must
be set to ‘false’ in every node of the schema, and the ‘required’ field must contain all property names.
Answer: A,C
Explanation:
Option A is correct. The ‘response_format’ argument with its JSON schema accurately specifies the
desired structured output for 'AI_COMPLETE. It correctly uses the ‘enum’ keyword to restrict the possible
values for ‘severity_lever and the ‘required’ field to mandate the presence of ‘event_name’ and
‘severity_lever fields in the output if the model can extract them. This reduces post-processing needs and
enables seamless integration.
Option B is incorrect. For the most consistent and deterministic results, especially when aiming for strict
JSON adherence in data pipelines, it is recommended to set the ‘temperature' option to A ‘temperature of
'0.7’ would lead to more diverse and random output, which is generally undesirable for structured data
extraction where consistency is key.
19 / 23
Option C is correct. In a production data pipeline, 'TRY_COMPLETE is preferred over 'AI_COMPLETE for
robustness. If the model fails to generate a valid response (e.g., cannot adhere to the schema or
encounters an internal error), 'TRY COMPLETE’ returns 'NULL’ instead of raising an error, allowing the
pipeline to continue processing other records without interruption.
Option D is incorrect. The number of tokens processed (and thus billed) for 'AI_COMPLETE Structured
Outputs does increase with schema complexity. Highly-structured responses, especially those with deep
nesting, consume a larger number of input and output tokens.
Option E is incorrect. The specific requirements for 'additionalProperties' being 'false' and the ‘required'
field containing all property names only apply to OpenAI (GPT) models when using Structured Outputs.
Other models do not strictly enforce these requirements, although including them may simplify schema
management across different model types.
22.A Gen AI Specialist is setting up their Snowflake environment to deploy a high-performance
open-source LLM for real-time inference using Snowpark Container Services (SPCS). They need to
create a compute pool that can leverage NVIDIAAIOG GPUs to optimize model performance.
Which of the following SQL statements correctly creates a compute pool capable of supporting an
intensive GPU usage scenario, such as serving LLMs, while adhering to common configuration best
practices for a new, small-scale deployment in Snowpark Container Services?
A)
B)
C)
D)
20 / 23
E)
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: D
Explanation:
Option D is correct. The 'GPU NV_M' instance family is explicitly described as "Optimized for intensive
GPU usage scenarios like Computer Vision or LLMsNLMs", providing 4 NVIDIAAIOG GPUs. Setting = 1'
and 'MAX_NODES = 1' is appropriate for a small- scale deployment, and = 1800' (30 minutes) is a sound
practice for cost management during inactivity.
Option A is incorrect because is a generic CPU instance, not a GPU instance suitable for LLMs.
Option B uses which is a GPU instance and the "smallest NVIDIA GPU size available for Snowpark
Containers to get started". While functional, ‘GPU is more directly aligned with "intensive GPU usage
scenarios like LLMs" as stated in the question. 'AUTO RESUME = TRUE is the default behavior.
Option C is incorrect because is a high-memory CPU instance, not a GPU instance. Setting = 0' means
the compute pool will not suspend automatically, which is generally not a best practice for a new,
small-scaledeployment unless continuous availability is strictly required.
Option E uses which is a CPU instance, making it unsuitable for GPU-accelerated workloads.
23.An ML engineer has developed a custom PyCaret classification model and wants to deploy it to
Snowpark Container Services (SPCS) for inference using the Snowflake Model Registry. The model
requires specific versions of pycaret’, 'scipy', and 'joblib'. The engineer also wants to make the service
accessible via an HTTP endpoint.
Which of the following Model Registry and service creation steps are ‘most appropriate’ for the ML
engineer? (Select all that apply.)
A)
B)
21 / 23
C)
D)
E) Opt for warehouse deployment instead of SPCS, as PyCaret is not natively supported by Snowflake
and managing its dependencies in SPCS would be overly complex compared to a warehouse.
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: A,C,D
Explanation:
Option A is correct. When bringing an unsupported model type, such as PyCaret, you must define a
'ModelContext’ that refers to the serialized model file (e.g., a pickled file).
Option B is incorrect. For models deployed to Snowpark Container Services, 'conda_dependencies' are,
by default, obtained from 'conda-forge', not the Snowflake Anaconda channel, which is used for
warehouse deployments. Therefore, relying on the Snowflake Anaconda channel for SPCS deployment is
incorrect.
Option C is correct. While 'conda_dependencies’ can be used for SPCS (resolved from 'conda-forge'),
'pip_requirementS are often a more direct and reliable way to specify dependencies for custom or less
common third-party Python packages, ensuring they are pulled directly from PyPI if not available in
'conda-forge’. The PyCaret example in the sources, while using ‘conda_dependencies’, represents a
specific case, and for broader 'custom third-party packages', pip is a strong choice.
Option D is correct. To make the deployed service accessible via an HTTP endpoint, must be set to ‘True'.
Additionally, ‘gpu_requests = (or the appropriate number of GPUs) is essential when deploying a model to
a GPU compute pool to ensure it leverages the GPU resources for inference.
Option E is incorrect. Snowpark Container Services is specifically designed to ease the restrictions of
warehouse deployment, allowing for the use of any packages (including PyPl) and enabling large models
to run on distributed clusters of GPUs, which is ideal for this scenario.
24.An ML engineering team is preparing to log a custom Python model to the Snowflake Model Registry.
This model has several Python package dependencies. The team wants to ensure the model can be
deployed optimally, either in a Snowflake warehouse or to Snowpark Container Services (SPCS),
depending on future needs. They are particularly concerned with how dependency specification impacts
deployment eligibility.
Which statements accurately describe how Snowflake handles model dependencies and determines
deployment eligibility for custom Python models logged in the Model Registry, particularly when
considering both Snowflake warehouse and Snowpark Container Services (SPCS) environments?
(Select all that apply.)
A. If all of a model's ‘conda_dependencieS are available in the Snowflake conda channel, the model is
automatically deemed eligible to run in a warehouse.
22 / 23
B. For models intended for SPCS, ‘pip_requirements’ are always preferred over ‘conda_dependencies’
because SPCS strictly prohibits the use of any conda packages from ‘conda-forge’
C. When ‘conda_dependencies’ are specified for a model to be deployed to SPCS, these dependencies
are by default obtained from ‘conda-forge’ rather than the Snowflake conda channel.
D. The function will fail if 'WAREHOUSE is specified in ‘target_platforms’ but the model's size or GPU
requirements make it ineligible for warehouse deployment.
E. Specifying both ‘conda_dependencies’ and ‘pip_requirements’ for a model is recommended to cover all
possible deployment scenarios, and Snowflake's build process ensures compatibility between them.
Answer: A,C,D
Explanation:
Option A is correct. When a model version is logged using ‘reg.log_moder, its ‘conda_dependencies’ are
validated against the Snowflake conda channel. If all dependencies are found there, the model is
considered eligible to run in a warehouse.
Option B is incorrect. Snowpark Container Services models, by default, obtain their
‘conda_dependencieS from ‘conda-forge’. Therefore, SPCS does not prohibit conda packages from
‘conda-forge’.
Option C is correct. The Snowflake documentation explicitly states that for models running on Snowpark
Container Services (SPCS), ‘conda-forge’ is the assumed channel for ‘conda_dependencies’, while the
Snowflake conda channel is for warehouse deployments only.
Option D is correct. If the 'WAREHOUSE platform is specified in the ‘target_platforms’ argument of, and
the model is ineligible for warehouse deployment (e.g., due to its size, dependencies, or GPU
requirements), the call will fail.
Option E is incorrect. Snowflake recommends using ‘either’ ‘conda_dependencieS ‘or’, but not both
simultaneously. This is because combining both can lead to package conflicts, causing the container
image to build successfully but potentially resulting in an unexpected or non-functional container image.
25.An ML engineer is preparing a Docker image for a custom LLM application that will be deployed to
Snowpark Container Services (SPCS). The application uses a mix of packages, some commonly found in
the Snowflake Anaconda channel and others from general open-source repositories like PyPI. They have
the following Docker-file snippet and need to ensure the dependencies are correctly installed for the
SPCS environment to support a GPU workload.
Which of the following approaches for installing Python packages in the Dockerfile would ensure a robust
and compatible setup for a custom LLM running in Snowpark Container Services, based on best practices
for managing dependencies in this environment?
A)
B)
C)
D)
23 / 23
E)
A. Option A
B. Option B
C. Option C
D. Option D
E. Option E
Answer: B
Explanation:
Option B is correct. The provided Dockerfile example for deploying Llama 2 in Snowpark Container
Services explicitly uses ‘conda install -n rapids -c https://repo.anaconda.com/pkgs/snowflake’ to install
Snowflake-specific packages like ‘snowflake-ml-python’ and ‘snowflake- snowpark-python’ from the
Snowflake Anaconda channel. It then uses ‘pip install' for other open-source libraries that are not
available or preferred from the Anaconda channels.
Option A is incorrect because while pip can install many packages, the provided example demonstrates
using ‘conda’ from the Snowflake Anaconda channel for certain foundational packages.
Option C is incorrect because while ‘conda-forge’ is a common channel for open-source packages, the
specific Snowflake-related packages in the example are pulled directly from the
'https://repo.anaconda.com/pkgs/snowflake’ channel. Although Source notes that ‘conda-forge’ is
assumed for ‘conda_dependencies’ in when building container images, a Dockerfile explicitly defining
‘RUN conda install' can specify the channel, which the example in demonstrates.
Option D is incorrect because the 'defaultS channel often requires user acceptance of Anaconda terms,
which is not feasible in an automated build environment.
Option E is a generic approach for pip dependencies but doesn't specifically address the recommended
use of ‘conda’ from the Snowflake Anaconda channel for certain core Snowflake packages as shown in
the practical example.
Snowflake GES-C01 Exam
SnowPro® Specialty: Gen AI Certification Exam
https://www.passquestion.com/ges-c01.html
Pass GES-C01 Exam with PassQuestion GES-C01 questi
https

GES-C01 SnowPro Specialty Gen AI Certification PDF Questions

Outros

Ferramentas de estudo

More content from this subject