Skip to Main Content

Is Generative AI Ready for Primetime at Your Organization?

Two easy real-world use cases for new capabilities.

When ChatGPT was released publicly in November 2022, less than a year and a half ago, nobody could have predicted how quickly it would reshape the world. ChatGPT and similar generative AI capabilities have become a staple of everyday life for regular people and the highest levels of business. Since then, major software and hardware vendors have capitalized on GenAI in numerous ways—some more successful than others. Most data-driven organizations have dipped their toes into AI somehow, but few have integrated AI into their data stack, and almost none have used GenAI in this way.

For this reason, I was excited to finally get my hands on Snowflake’s GenAI tools this year. The two most exciting ones are Snowflake DocumentAI, which trains on unstructured data like images and PDF files, and Snowflake Cortex, their easy button for GenAI. Over several months of experimentation, I explored these features. I built interesting functions, learned the tools, and created great demos. At the end of the process, I found I had several use cases that could immediately benefit companies’ data operations. Today, I’ll share two with you that demonstrate the concepts. But first, some technical basics on how to get started.

Preparing for GenAI Success

Before we discuss the two GenAI use cases, let’s talk about how to set yourself up for success. In these two use cases, Streamlit, a lightweight Python-based UI, is used for data input and output. Streamlit has two versions, one outside of Snowflake on a web server and the other governed by Snowflake, which runs inside of Snowflake. Streamlit is a lightweight Python-based application used for data input and creating some data output charts and tables.

In the past, using SQL Server, I begrudgingly found myself reaching for Microsoft Access for data entry and lightweight reporting. Now, I simply create Streamlit apps. Even more practical, developers can create reusable Streamlit Apps and share them with their teams instead of just sharing code snippets.

How a Streamlit App Works

I’ll share a quick solution that arose while I was trying to generate sample data that included Personally Identifiable Information (PII) for testing. I’ll often need a large data set of values like names and social security numbers to test data systems. I can’t use real data sets for obvious reasons, but the data needs to look real, following the patterns used by names, phone numbers, and social security numbers.

Before Streamlit, I would write a script and save it in Git or on my computer. This system worked but was difficult to maintain and find when I needed it again in the future. By creating a Streamlit app, I created a simple UI that lets me choose the database and schema I wanted to use, name the file, and select the number of sample records I wanted to create.

After clicking Generate Data, the app creates the table and returns the first five rows.

I can also share this app with other Snowflake users and even give it to users without access to anything else in Snowflake.

Snowflake Cortex is a fully managed service that provides several state-of-the-art ML and AI solutions within your Snowflake environment, giving you greater security, scalability, and governance capabilities. This is Snowflake’s easy button for GenAI, which allows you to take advantage of the power of LLMs in as little as one line of SQL.

Use Case 1: Data Completion

We’ll start with an app that can enrich company data with AI-generated answers. Using just the table below and Snowflake Cortex, you can enrich your data with answers to plain language like “Who is the CEO of this company?” and “How much did this company sell last year?”

 

This single line of SQL code takes the company name and adds the CEO:

SNOWFLAKE.CORTEX.COMPLETE('gemma-7b',concat('Who is the CEO of this company:',NAME,', answer only with the CEO name'))

Here is what the final app looks like:

 

This solution works at scale, allowing you to work on your entire data set, not just a few problem accounts. As with all generative AI, it’s a good idea to trust but validate the answers and understand that recent changes may not be included in the LLM and its output.

This use case arose when clients wanted to clean up and enrich their Salesforce data, where only about 25% of their customers’ websites were populated. With Snowflake Cortex, they could populate the missing information in minutes.

Use Case 2: Translation

Translation is another favorite use case of mine. As a consultant, I frequently work on projects with data in other languages for international companies. Solutions like Google Translate can help users manage translations of a single text, but translating at scale is much more challenging. Using Cortex. Translate: you can easily translate to and from 12 different languages within your data stack, as shown in this Streamlit app.

Like Cortex Complete, this was accomplished with a single line of SQL:

select C_COMMENT, SNOWFLAKE.CORTEX.TRANSLATE(C_COMMENT, 'en','ES') as C_COMMENT_ES from customer

In a current project, one of a company’s data sources is a local application that stores the data in Mandarin. When I first heard this request, I worried I couldn’t complete the project since I didn’t know Mandarin and couldn’t learn it in time.  Unfortunately, Snowflake Cortex Translate only supports twelve languages, and Mandarin was not one of them, so I sought other solutions and found Azure AI. Translate. With Azure AI Translate, I could write Python scripts to translate from Mandarin to English, but it was not as seamless as writing a query to translate the data inside of Snowflake.

I decided to explore Snowflake Cortex Complete with different LLMs further to see if there was a better way to translate from Mandarin, and I was able to write this very simple query.

SELECT '材料成本差异(计划价格法)' as original, SNOWFLAKE.CORTEX.COMPLETE('mistral-large', 'translate this to English: 材料成本差异(计划价格法) just give me the translation no other words, do not enclose in double quotes') as translation;

The result:

With this solution, I have the power to work directly with client data in a Snowflake query, bypassing the need for external function calls.  This empowers me to provide a faster and better-governed solution, all within the Snowflake environment.

Generative AI Has a Lot to Offer Data Leaders

These are two small examples of how GenAI can immediately be integrated into a company’s data stack for full-fledged use—but they’re far from the only ones. In follow-up articles, we’ll discuss more use cases and how they can apply to companies in all sectors. Meanwhile, if you’re curious about any of these offerings, please reach out.