Querying Many Databases in Parallel in Python

I have some Python scripts that I use to access a bunch of databases to gather information such as the size of the database. Usually it is not that important how long they take so I have been running queries against a list of 100 databases one at a time and it can take a while depending on what I do. Coworkers of mine have worked on running Unix/Linux shell scripts in parallel to hit a bunch of systems like this more quickly. So, I thought I would look at how to do the same in Python. I am sure there are more complicated ways to do it, but I got good results based on the simplest example from this part of the Python documentation:

https://docs.python.org/3.8/library/multiprocessing.html

Here is a simple python script to get the size of the database files from four databases at the same time:

All I did was take the first example from the multiprocessing documentation and replace f() which squared a number with dbspace() which connects to a database and runs a query to get the total db size. I experimented with different Pool() sizes. It is 4 in the example.

I picked 82 non-production databases that I have access to and ran the same query sequentially in a loop and it took 375 seconds. I took the code above and increased the Pool to 32 processes and it ran in 24 seconds. That is about a 15x speedup.

I am sure that I could get a lot more complicated with all this stuff, but this works well with minimal effort.

Bobby

About Bobby

I live in Chandler, Arizona with my wife and three daughters. I work for US Foods, the second largest food distribution company in the United States. I have worked in the Information Technology field since 1989. I have a passion for Oracle database performance tuning because I enjoy challenging technical problems that require an understanding of computer science. I enjoy communicating with people about my work.

View all posts by Bobby →

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Querying Many Databases in Parallel in Python

Tim N. says:

December 11, 2020 at 10:01 am

Parallelizing tasks across server targets or across server => database targets is cool stuff. I started with parallelizing in 2005 using ClusterIt dsh (hard to find on Linux) and GNU Parallel.

I found it takes well designed set of helper tools/scripts to scale up a robust/reliable parallelizing infrastructure. Error checking at scale is tough.

Your example using GNU Parallel might look like this:
me@myhost:1950 [/home/me/Library/dsh]
$ cat z.sql
set heading off
set pagesize 0

column instance_name format a20
column mbytes format 999,999,999,999
with
my$view as
(
select bytes from dba_data_files union all
select bytes from dba_temp_files union all
select bytes from v$log
)
select max( i.instance_name ) instance_name ,( sum(bytes) / power( 2 ,20 ) ) mbytes
from my$view ,v$instance i
;
me@myhost:1950 [/home/me/Library/dsh]
$ time parallel ‘sqlplus -S -L {} @z.sql;’ ::: myuser/mypass@orcl1db myuser/mypass@orcl2db myuser/mypass@orcl3db myuser/mypass@orcl4db
orcl1db1 110,830

orcl2db1 6,030

orcl3db1 327,187

orcl4db1 54,037

real 0m4.80s
user 0m1.54s
sys 0m0.34s
me@myhost:1950 [/home/me/Library/dsh]
$
*****
I abbreviated the output, 4.8s hit 15 databases on a single server.

Ciao, Tim…

- Bobby says:
  
  December 11, 2020 at 1:04 pm
  
  Thanks Tim. I have been meaning to look at what my coworkers are using with Linux shell scripts. It is great to have your example and information as well.
  
  Bobby

Querying Many Databases in Parallel in Python

About Bobby

2 Responses to Querying Many Databases in Parallel in Python

Leave a Reply to Bobby Cancel reply

Subscribe to Blog via Email

Archives

Bobby Durrett\’s DBA Blog

Orafaq blog aggregator

Querying Many Databases in Parallel in Python

Share this:

About Bobby

2 Responses to Querying Many Databases in Parallel in Python

Leave a Reply to Bobby Cancel reply

Subscribe to Blog via Email

Archives

Bobby Durrett\’s DBA Blog

Orafaq blog aggregator