Joining Tables with Aggregate Functions in SQLite and Python3 for Complete Data Retrieval

SQLite and Python3: A Deep Dive into Joining Tables with Aggregate Functions

As a developer working with databases, it’s not uncommon to encounter complex queries that require joining multiple tables while aggregating data. In this article, we’ll delve into the world of SQLite and Python3, exploring how to join tables with aggregate functions like GROUP_CONCAT().

Understanding the Problem

The problem at hand involves a database schema consisting of five tables: scans, systems, ports, plugins, and maps. The goal is to retrieve data from these tables, specifically for the last scan day. We’re interested in getting the system’s IP address, protocol, port number, service name, and all plugins identified on that system’s port.

The provided query attempts to join these tables but results in incomplete data. To achieve our desired output, we need to employ aggregate functions like GROUP_CONCAT() to concatenate plugin names and descriptions.

Sample Database Schema

To better understand the problem, let’s examine the sample database schema:

Table 1: scans
scan_id     |    scan_date
1                2.2020
2                3.2020

Table 2: systems
system_id   |    ip           |   dns
1             123.456.789.1     one.dns.com
2             123.456.789.2     two.dns.com

Table 3: ports
port_id     |   protocol      |   port    |  service
1               tcp               22         ssh
2               udp               161        smtp
3               tcp               80         http

Table 4: plugins
plugin_id    |  pluginname     |  plugindescription
1                1234567890       Beschreibung1
2                2345678901       Beschreibung2

Table 5: maps
maps_id      |  system_id      |   port_id   |   plugin_id | scan_id
1                  1                  1              1          1
2                  1                  3              1          1
3                  1                  3              2          1
4                  1                  3              3          1
5                  2                  2              1          1
6                  2                  2              2          1
7                  2                  2              3          1
8                  1                  1              1          2
...

The Initial Query

The initial query attempts to join the tables with inner joins but results in incomplete data:

SELECT system_IP, protocol, port, service, PluginName 
FROM maps
INNER JOIN scans ON (maps.scan_id = scans.scan_id)
INNER JOIN systems ON (maps.system_id = systems.system_id)
INNER JOIN ports ON (maps.port_id = ports.port_id)
INNER JOIN plugins ON (maps.plugin_id = plugins.plugin_id)
WHERE scan_id=1

This query only returns the plugin name for each system-port combination without aggregating the descriptions.

Employing GROUP_CONCAT()

To achieve our desired output, we can employ the GROUP_CONCAT() function in SQLite. This function concatenates all rows with a specific separator. We’ll use it to concatenate the plugin names and descriptions.

SELECT systems.system_Id, ports.protocol, ports.port, ports.service, 
  GROUP_CONCAT(plugins.PluginName) AS pluginnames,
  GROUP_CONCAT(plugins.plugindescription) AS plugindescriptions
FROM maps
INNER JOIN scans ON (maps.scan_id = scans.scan_id)
INNER JOIN systems ON (maps.system_id = systems.system_id)
INNER JOIN ports ON (maps.port_id = ports.port_id)
INNER JOIN plugins ON (maps.plugin_id = plugins.plugin_id)
WHERE scans.scan_id=1
GROUP BY systems.system_Id, ports.protocol, ports.port, ports.service

In this query:

  • We join the tables as before but add two new columns using GROUP_CONCAT().
  • The first column concatenates all plugin names with a space separator.
  • The second column concatenates all plugin descriptions with a space separator.

By grouping by the system ID, protocol, port number, and service name, we ensure that each row in our output table corresponds to a unique combination of these fields.

Handling Null Values

When working with GROUP_CONCAT(), it’s essential to handle null values. If a plugin description is null for a particular plugin, it will be excluded from the concatenated string. To include null descriptions in the output, we can use the NULLIF() function:

SELECT systems.system_Id, ports.protocol, ports.port, ports.service, 
  GROUP_CONCAT(plugins.PluginName) AS pluginnames,
  GROUP_CONCAT(
    NULLIF(plugins.plugindescription, '')  
  ) AS plugindescriptions
FROM maps
INNER JOIN scans ON (maps.scan_id = scans.scan_id)
INNER JOIN systems ON (maps.system_id = systems.system_id)
INNER JOIN ports ON (maps.port_id = ports.port_id)
INNER JOIN plugins ON (maps.plugin_id = plugins.plugin_id)
WHERE scans.scan_id=1
GROUP BY systems.system_Id, ports.protocol, ports.port, ports.service

This way, even if a plugin description is null, it will be included in the output string.

Conclusion

In this article, we’ve explored how to join tables with aggregate functions like GROUP_CONCAT() in SQLite. By employing these functions, you can efficiently process large datasets and retrieve meaningful insights from your data. Remember to handle null values effectively when working with group concatenation to ensure accurate results.


Last modified on 2024-09-25