Data Management With Python - 3
Table of Contents
Explore the Benefits of SQLAlchemy for Managing Relational Databases
In this blog post, we continue our journey of discovering effective ways to handle relational databases by introducing SQLAlchemy, a popular third-party library. This powerful tool offers both Core and ORM (Object Relational Mapping) features, making it a versatile solution for database management.
SQLAlchemy Core provides a Python-based abstraction of SQL, allowing for easy manipulation of databases. Meanwhile, the ORM component utilizes object-oriented modeling to represent database rows, making it a more intuitive approach to database management.
One of the biggest advantages of using SQLAlchemy is its ability to abstract away connection mechanisms. With SQLAlchemy, you can initialize the engine in the same manner, regardless of the type of database you’re connecting to. This means you can seamlessly work with a range of supported databases, including SQLite, PostgreSQL, MySQL, Oracle, MS-SQL Server, and many more.
Take the first step towards streamlining your database management with SQLAlchemy.
Connecting to the Database:
# Connecting
from sqlalchemy import create_engine,text
engine = create_engine('sqlite:///./books.db',echo = True, future=True)
conn = engine.connect()
In the code snippet above we are importing the engines factory and creating an engine. The URL-style parameter here indicates what we are connecting to. Instead of http, we specify the type of database. For this series, that’s sqlite. The path part indicates where the database is. For a database that used a server, this would have a hostname and a port. Because SQLite uses files, this is a file reference. I’m connecting to the books.db file in the current directory. The engine factory uses lazy loading, so the connection hasn’t been established yet.
The echo parameter turns on SQL echoing. Everything we do through this connection will print out the SQL equivalent.
Everything from here on will be done through my conn object. Now I’ll write a query. For this lesson, I’ll be solely using the text query object.
Retriving Data using text() and execute()
result = conn.execute(text("SELECT * FROM author"))
2023-01-23 13:55:52,112 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-01-23 13:55:52,117 INFO sqlalchemy.engine.Engine SELECT * FROM author
2023-01-23 13:55:52,119 INFO sqlalchemy.engine.Engine [generated in 0.00964s] ()
The .execute() method on the connection object runs a query. The text object takes a string as an argument containing raw SQL. You can see that it created a transaction—that’s the BEGIN—then did a select. The database was able to run the query in 0.00964 seconds. Pretty snappy. Let’s look at the results.
print(result.all())
[(1, 'Isaac', 'Asimov'), (2, 'Pearl', 'Buck'), (3, 'Tom', 'Clancy')]
result.all() returns a list of tuples, and the database being used here was populated with three authors in the previous post, good old Isaac, Pearl and Tom. All right, let’s make some changes now.
Let’s do an insert, we’ll need to execute a query.
Parameterized Inserts
conn.execute(text("INSERT INTO author (first_name,last_name) VALUES(:fn,:ln)"),[{"fn":"Stephen","ln":"King"},{"fn":"Sashi","ln":"Tharoor"}])
2023-01-23 14:04:23,480 INFO sqlalchemy.engine.Engine INSERT INTO author (first_name,last_name) VALUES(?,?)
2023-01-23 14:04:23,483 INFO sqlalchemy.engine.Engine [generated in 0.00239s] (('Stephen', 'King'), ('Sashi', 'Tharoor'))
<sqlalchemy.engine.cursor.CursorResult at 0x1f1a8b8ce80>
Notice that in the VALUES clause of the INSERT, I’m using :fn and :ln. The colon indicates that these are placeholder values.
SQLAlchemy sees these and knows to populate them with the parameters that you passed to text(). This next line is those parameters. passed in a list of dictionaries where each dictionary has an “fn” and “ln” key. These get mapped to the :fn and :ln placeholders in the query.
Let’s query for our authors again.
result = conn.execute(text("SELECT * FROM author"))
print(result.all())
[(1, 'Isaac', 'Asimov'), (2, 'Pearl', 'Buck'), (3, 'Tom', 'Clancy'), (4, 'Stephen', 'King'), (5, 'Sashi', 'Tharoor')]
And there you go. There are now four authors in the database.
Commit & Rollback
Remember that BEGIN statement several queries ago? I’m still inside of the same transaction.
These all get grouped together as part of the database. Calling .commit() closes the transaction off and makes all the changes permanent. So what are transactions good for? Well, if you get partway through a group of queries, you can undo all of them by rolling back the transaction.
conn.commit()
2023-01-23 14:09:58,347 INFO sqlalchemy.engine.Engine COMMIT
#Rollback example
conn.execute(text("INSERT INTO author (first_name,last_name) VALUES (:fn,:ln)"),[{"fn":"Not","ln":"Anauthor"}])
2023-01-23 14:11:54,544 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-01-23 14:11:54,547 INFO sqlalchemy.engine.Engine INSERT INTO author (first_name,last_name) VALUES (?,?)
2023-01-23 14:11:54,549 INFO sqlalchemy.engine.Engine [generated in 0.00400s] ('Not', 'Anauthor')
<sqlalchemy.engine.cursor.CursorResult at 0x1f1aca84d30>
result = conn.execute(text("SELECT * FROM author"))
2023-01-23 14:12:37,890 INFO sqlalchemy.engine.Engine SELECT * FROM author
2023-01-23 14:12:37,891 INFO sqlalchemy.engine.Engine [cached since 1006s ago] ()
print(result.all())
[(1, 'Isaac', 'Asimov'), (2, 'Pearl', 'Buck'), (3, 'Tom', 'Clancy'), (4, 'Stephen', 'King'), (5, 'Sashi', 'Tharoor'), (6, 'Not', 'Anauthor')]
And now there’s ‘Not’, ‘Anauthor’ cluttering up my data. Calling .rollback() instead of .commit() and everything has been undone to the point of the last commit.
conn.rollback()
2023-01-23 14:18:20,270 INFO sqlalchemy.engine.Engine ROLLBACK
result = conn.execute(text("SELECT * FROM author"))
2023-01-24 00:03:15,296 INFO sqlalchemy.engine.Engine SELECT * FROM author
2023-01-24 00:03:15,298 INFO sqlalchemy.engine.Engine [cached since 3.644e+04s ago] ()
print(result.all())
[(1, 'Isaac', 'Asimov'), (2, 'Pearl', 'Buck'), (3, 'Tom', 'Clancy'), (4, 'Stephen', 'King'), (5, 'Sashi', 'Tharoor')]
fiddling around with query results some more.
Same select as before but this time, instead of calling .all(), we’ll iterate over the result object. Each item in the result object iterator corresponds to a row in the database I can access the columns in a result by using the columns’ names: row.last_name and row.first_name, for example.
result = conn.execute(text("SELECT first_name,last_name FROM author"))
2023-01-24 00:18:24,922 INFO sqlalchemy.engine.Engine SELECT first_name,last_name FROM author
2023-01-24 00:18:24,925 INFO sqlalchemy.engine.Engine [generated in 0.00248s] ()
for dict_row in result.mappings(): # mappings method on result is used to return a dictonary with header column as the key and the row value as the value.
print(dict_row)
{'first_name': 'Isaac', 'last_name': 'Asimov'}
{'first_name': 'Pearl', 'last_name': 'Buck'}
{'first_name': 'Tom', 'last_name': 'Clancy'}
{'first_name': 'Stephen', 'last_name': 'King'}
{'first_name': 'Sashi', 'last_name': 'Tharoor'}
Note that the result object is temporal. Accessing the items in it consumes them. That’s why calling .all() now shows an empty list.
which is expected behavior. A resultset is context aware, that way if you read part of it you can go back and read the rest. The consequence of that is if you read all of it, it will be empty. Being able to re-play actually only makes sense in the case of a SELECT, for any other query you’d be double-inserting, or trying to delete something that was already gone. You can stash the results away in a list and then mess with the list if that suits your needs. Otherwise, you’ll need to re-run the query.
Parameterized Select
Just like with the INSERT statement, you can parameterize a SELECT statement.
result = conn.execute(
text("SELECT last_name FROM author WHERE last_name > :ln"),
{"ln":"C"}
)
print(result.all())
2023-01-24 00:27:28,884 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-01-24 00:27:28,887 INFO sqlalchemy.engine.Engine SELECT last_name FROM author WHERE last_name > ?
2023-01-24 00:27:28,891 INFO sqlalchemy.engine.Engine [generated in 0.00726s] ('C',)
[('Clancy',), ('King',), ('Tharoor',)]
Here, the WHERE is saying that I’m looking for authors whose last name is greater than capital “C”. Because I used a parameter to define this, I could actually have passed in the dictionary “ln”:“C” at runtime.
There’s the result of the query. Don’t forget that this is doing string comparison. Something that starts with capital “C”, like “Clancy”, and is bigger than a single letter, like “Clancy”, is actually greater than just capital “C”.
binding parameters to a statement: so that the behaviour of query can change at runtime.
stmt = text("SELECT last_name FROM author WHERE last_name > :ln") # :ln is the variable
stmt # this returns a text object
<sqlalchemy.sql.elements.TextClause object at 0x000001F1A8B894C0>
stmt=stmt.bindparams(ln = "C") # this is where we are giving values to the variable; and it returns the statemet with values to be called in a query
result = conn.execute(stmt)
print(result.all())
2023-01-24 00:32:24,205 INFO sqlalchemy.engine.Engine SELECT last_name FROM author WHERE last_name > ?
2023-01-24 00:32:24,206 INFO sqlalchemy.engine.Engine [generated in 0.00190s] ('C',)
[('Clancy',), ('King',), ('Tharoor',)]
# we can use the statement and bind new parameters:
stmt = stmt.bindparams(ln = "B")
result = conn.execute(stmt)
print(result.all())
2023-01-24 00:34:15,147 INFO sqlalchemy.engine.Engine SELECT last_name FROM author WHERE last_name > ?
2023-01-24 00:34:15,151 INFO sqlalchemy.engine.Engine [cached since 110.9s ago] ('B',)
[('Buck',), ('Clancy',), ('King',), ('Tharoor',)]
Closing Connection
# Clean-up
conn.close()
2023-01-24 00:35:02,025 INFO sqlalchemy.engine.Engine ROLLBACK
SQL Injections:
A word of caution when writing raw SQL. You might be tempted to think, Hey, this is all text. I can just concatenate or use f-strings! 🤔
And that would be a very bad idea. Doing it this way leaves you open to what’s called an SQL injection attack.
A clever person might pass you a last_name that would break your code or worse, do malicious things. You may have noticed that all the SQL calls have ended with a semicolon (;).
If I pass in a last_name here that has a semicolon and then, say, a DELETE statement, you’d be running both your INSERT and my DELETE, and that’s probably not what you want.
Parameters solve this problem.
They properly escape anything you pass in, so you should always use parameters.
The OWASP Top 10 vulnerabilities list dropped SQL injection to spot three in 2021, which is down from spot one in previous years.I’m not sure whether that’s because the coding community is finally waking up to these kinds of exploits or whether it’s just that spots one and two actually just surpassed it.