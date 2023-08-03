By Nikolai Kuznetsov

The premise of blockchain is that the data it processes cannot be tampered with. So far though, the throughput of major blockchains was incredibly tiny compared to what businesses have to work with.

What’s more, blockchains can’t even access their own data efficiently. Ethereum archive nodes require between 3 and 12 Terabytes of data (depending on implementation), but smart contracts are simply unable to access it without spending fortunes in gas.

A number of projects built indexers to address the inefficiencies related to blockchain data archival, solving some immediate needs such as DEX analytics pages. But a new kid on the block, Space and Time, is working to take this concept up to a whole new level.

Space and Time built a comprehensive “data warehouse” system relying on a cryptographic Proof of SQL system, which is designed to scale data verification capabilities to hundreds of terabytes. With AI evolving, this becomes ever more critical even outside of Web3.

We sat down with Scott Dykstra, the CTO of Space and Time, to learn more about how it aims to use blockchain to its fullest potential.

Hi Scott, nice to meet you. Let’s start with a basic introduction of the topic. Why does the world need verifiable and tamper-proof data? Is this a use case only for Web3, or do you see this branching out into other fields?

SD: I think verifying that data hasn't been tampered with and that computation on data was done correctly is becoming even more important as we enter a chaotic, AI-driven world. It comes down to simply having confidence that the systems we rely on are provably neutral, transparent and untampered. The use case is very clear in Web3, where a global network of node operators are pooling their compute resources to power a provable network, but it’s also important outside of Web3 across several industries.

You want to know that financial systems like banks aren’t being manipulated. You want trading systems like stocks or crypto to be transparent and traceable. You want businesses sharing sensitive data—like patient healthcare data or accounting records—to be able to do so in a privacy-preserving, but provably untampered way.

If you had the ability to provably verify that both data and the processing of data haven’t been tampered with, and you could do so with familiar database tools at a reasonable cost, why wouldn't you?

You mentioned AI becoming ever more prevalent in our lives. Many fear how we’re unable to see its inner workings, so the conversation around it often focuses on “aligning” it and controlling its outputs. Do you think that is an achievable goal, and if so, how?

SD: When we classify the size of a large language model, we’re talking about billions of parameters. There are, I believe, 70 billion parameters in Llama 2, the most popular open-source LLM. You can think about those parameters as 70 billion data points in a database. If the training of your large language model is fed directly from a verifiable database holding those data points, then you can train the AI model in a trustless, transparent way.

You can prove that the data fed into the model for training was tamperproof if it's fed directly from Space and Time with verifiable circuitry. The same principle applies to the weights of the model, the settings. Of course, that becomes more complicated because we have to prove that they weren't tampered with later, after leaving Space and Time.

Speaking of AI, Space and Time recently released a chatbot to automatically generate SQL queries for your system. Tell us more about that, and why you decided to release it. Do you think current LLMs are precise enough for this kind of usage?

SD: Fortunately, yes. We were shocked by the quality of SQL returned by GPT-4 when we provided the necessary context. We built a system where the user provides a simple prompt, and we send that prompt to GPT-4 along with all of the context of the database that the user is working within: tables, columns, foreign keys, SQL syntax examples. And we've been shocked at the accuracy—we're seeing customers report 80% to 90% accuracy for queries that stay under about 30 lines of SQL. For queries of that complexity, it’s incredibly accurate.

We built this system to solve what we saw as an important, widespread and unsolved problem: no one likes writing SQL to query databases. In my opinion, it’s the perfect use case for GPT-4, which is not quite advanced enough to write code without a lot of human intervention. But it’s very good at writing SQL. And people are not.

Space and Time is unique among its peers because it verifies data through cryptographic proofs that you call Proof of SQL. Could you give us an ELI5 on how it works? How can ZK proofs provide guarantees of the query and its data being correct?

SD: The key is that we take a digital fingerprint of data as soon as it’s ingested into Space and Time, regardless of where it came from—regardless of whether it's data we've collected from major blockchains, data streamed in from video game servers or TradFi markets, or data inserted by an app.

We capture a digital fingerprint—almost like a fancy hash of the data, if you will—then put it on-chain in a smart contract. All of the raw data is loaded into our data warehouse, but the fingerprint is small enough to be stored on-chain in an affordable way. Then, when you query the data, the data warehouse generates the query result along with a cryptographic circuit called a ZK proof, proving that, not only has the underlying data of the query not been tampered, but neither has the actual processing of the data.

Finally, we send a smart contract the query result, the proof and the digital fingerprints in a very easy and computationally lightweight way. A smart contract can do some quick math to compare the query result with the proof. It doesn’t even have to be a smart contract that verifies it either, it could be an iPhone running a client library, a banking system or trusted third-party auditors.

We’ve built this framework for verification so that anyone can do it—it’s not just applicable to Web3. The real key is those digital fingerprints.

So with Space and Time, smart contracts can make direct queries to verifiable data warehouses, right? What are the use cases for this kind of feature?

SD: We believe that the next wave of Web3 is data-driven financial services, where smart contracts have the ability to store and process a large volume of data and answer very complex questions about activity on their own chain, or elsewhere.

Today, smart contracts can’t even answer basic questions like, “show me all wallets that own two of this NFT.” For the next wave of Web3, they’ll need to be able to answer much more complex questions like, “what's the implied volatility of Tesla stock?” Or “what's the United States' current risk-free interest rate?” These are important financial primitives that require computation over historical data.

The next wave of financial services will require that smart contracts have the ability to ask arbitrary questions, and the answers to those questions is processing large volumes of data. Space and Time can handle that processing.

How do you see the industry benefiting from Space and Time and Proof of SQL in the future? What is your Utopian vision, if you will, of how the product will change our lives?

SD: Data warehouses power the world’s business, but centralized cloud services are extremely costly. A decentralized data warehouse means that the community can contribute compute—anyone can stand up a database server, lend it to the network and get paid for the work it does, which drives down the cost of database computing significantly.

We can offer a much more affordable service with similar performance as popular cloud data warehouses. But doing so requires Proof of SQL. If we allow anybody in the world to contribute servers, we have to prove those servers are not tampered with. So that’s one piece of the utopia: more affordable database services to power the business of the world.

Another piece is offering the Web3 industry a way to process much larger volumes of data than could ever live on-chain, even on the newer, more scalable chains like L2s. Space and Time is a solution that sits next to every major chain and supplements the compute and storage of the chains.

And the third, most important piece is that Space and Time offers a way to bring the trustless nature of blockchain to databases. The world's business runs on databases, and blockchain introduced the idea of trustlessness and provability; we're bringing that technology to databases.

Now what does all of this mean? It means trustlessless in all industries—assurance that, for example, bankers can't manipulate their books and pretend they have reserves or assets that they don't.

Space and Time empowers that world, and can do so in an incredibly cost-effective way. It also means that blockchain technology can finally scale to onboard the world's business logic. If smart contracts can trustlessly ask questions about terabytes of traditional business data, we can finally usher in the vision of Web3, which is putting the world's business logic on-chain.

That’s the future we’re building at Space and Time.

