IBM CLOUD SQL QUERY is a reasonably new service that has appeared in the IBM Cloud catalog. Being a one-time Oracle DBA, where SQL was one of my daily staples, I thought I’d have a closer look….
What is is?
SQL Query provides the ability to query data sets that have been uploaded to IBM Cloud Object Storage (i.e. csv, JSON, ORC or Parquet files), using Structured Query Language (SQL!). What’s really cool about this is that you don’t need to provision a database of any kind, you just provision the SQL Query service and start writing SQL queries against your files, using a GUI. The query data is returned into the GUI and it’s also stored back into Object Storage for later use.
For example, the UK Government publishes lots of data, much of which you can download in csv format, at https://data.gov.uk/. I’ve downloaded one or two of the files published by UK Trade and Investment, which details payments they have made by month, up to December 2015. Having downloaded the csv file, I’ve uploaded it to a bucket in my object storage and then created an instance of SQL Query so I can delve into the data.
While I’ve used a csv file created by the UK Government, the same principles here apply to any csv file (or JSON, ORC or Parquet file), including those you generate as part of your business.
Once I’ve fired up SQL Query, I’m ready to go.
So, in the screenshot above, you can see my SQL query in the top portion of the screen and the results in the bottom section. Here, I’m just using a pretty basic ‘select *’ (if you’re not familiar with SQL, this basically means ‘select everything’. The ‘from’ part of the query is the location and the name of the file in object storage (here, it’s the ‘ams03’ site, a bucket called ‘datsets’ and a file called ‘UKTI__Programme__November_2015__published_.csv.
The ‘into’ clause denotes the location and bucket where I want details and the output of my query saved – so I get a csv file as an output, which is stored in Object Storage that I can then later download and look at in say Excel.
And More Complex Stuff?
You can pretty much write any SQL-compliant SELECT query you want. For example, I wrote this query against the same file:
select SUPPLIER, SUM(AMOUNT) from cos://s3.ams03.objectstorage.service.networklayer.com/datsets/UKTI__Programme__November_2015__published_.csv group by SUPPLIER order by sum(AMOUNT) DESC into cos://s3.ams03.objectstorage.service.networklayer.com/datsets-output
This essentially lists each supplier and shows how much each of them has received as a sum of the AMOUNT column in the csv file. Of course, you can also query more than one file at a time by joining them and so on, as you would tables in a SQL-compliant database.
Why is this useful?
This is a great service for querying data that is in csv, JSON, ORC or Parquet format. There’s no need to create and configure a relational database or import the data into it to start querying your data and as someone that used to do that kind of stuff as a day job, that’s a massive time saving as well as a large saving on database administration overheads. Of course, this is also server-less, so there’s no need to spin up and pay for servers. Because the service is backed by Cloud Object Storage and the data files and the query outputs are stored in buckets, you can also make them highly resilient to outages and highly available. The SQL Service is also available to your applications via the REST API, via the ibmcloudsql package, if using Python or via Cloud Functions.
Find out More…
You can find out more by:
Checking out the documentation by clicking here
Learn more about IBM Cloud Object Storage by clicking here