Script that reads a CSV file and constructs a SQL query string that inserts every row of this file into an SQL database -- you can also connect to an PostgreSQL database (by default) and actually run that generated insert command.
I was studying SQL and I needed to pass an example CSV file to my database in order to do some operations and study the behavior of the SQL syntax, so I thought that it would be a really good quick project to rest a bit and let all this SQL stuff for later. Probably, this script isn't the best solution for that problem, so if you want a proper tool to do that, maybe you'll need to do a little research. But, by the way, if this little side project is useful to you, I would be happy.
The thing that I was studying with this project is to find a good and simple Python project structure that is easy to use/install and test. This happens to be a really simple command line tool, so it's better to hold the entire code base in a single file, so the user could just copy that source code and use it as an app -- the challenge was to make a testable command line tool that doesn't require installing any libraries, just using what Python already offers me by default.
project_script-to-convert-csv-file-into-sql-command-2024-01-20_19.36.02.mp4
As I said, the important file is the csv_to_sql/csv_to_sql.py
file, so the way to go is just copy and paste the contents of this file in somewhere on your $PATH
. You can just select the text from your browser, open your text editor and paste its contents, or you can just follow the steps below:
git clone https://github.com/kevinmarquesp/csv_to_sql
cp csv_to_sql/csv_to_sql/csv_to_sql.py ~/.local/bin/csv_to_sql
rm -rf csv_to_sql #removes the cloned repository
To uninstall you'll need to remove this file from your system, so it depends on where you've installed it in the first place. But you followed the recommended steps to install in this document, you can uninstall with:
rm ~/.local/bin/csv_to_sql
This code should not have any external dependencies that the user should use python3 -m pip
to install, the objective is to be plug and play tool. Also, I think everything becomes a lot easier -- for the user, at least -- if all the code base is condensed into a single file, which could be a challange to mantain, it's important to write good and readable code for that.
I'm using a simple "pythonic" styleguide, the only constraint is that each line should not be longer than 80 characters and that I'm using double quotes (""
) as the main characters to denote a string -- that's because english has a bunch of '
in its grammar, and escaping each of them is just too annoying.
Tip
Every time you update the docstrings of the csv_to_sql.py
file, you can run the build_docs.sh
script to automatically update the contents of this README.md
file. But you'll have to install the pydoc-markdown
library from pip
; its a command line utility that generates a Markdown documentation based on the docstrings inside a file/module.
About the tests, it has a similar philosophy. There is a single test script -- the test.py
file -- that uses the default unittest
library. You can run these tests with: python3 test.py
. Also, it's a good idea to be creative and do something to run this tests constantly, you can use a watch command to run every time the csv_to_sql/csv_to_sql.py
is modified or something like that, I personally likes to put it in a while sleep loop.
def log(msg: str)
Simple function that replaces some special characters to their
respective color codes, like replacing an [g]
, which means "green", to
\033[32m
of the string, then printing it on the screen.
- msg: The message that you want to show on the terminal output.
def parse_arguments(args: list[str]) -> Namespace
Given an list of arguments -- e.g. ['argument', '-o', '--option', 'optton_value']
-- this function puts everything together and returns a
Namespace
object with all that arguments parsed to Python's types and
easily accessible with the parsed_args.option
notation.
- args: List of arguments that the user has specified, maybe you would
like to use it with the
sys.argv
list to access the command line arguments.
def join_list_format_sql(data: list[str]) -> str
Utility function that joins every instance of a list with a ,
character, and put that string between ()
. Creating a string that is
compatible with SQL syntax, you can use it to select columns in a table or
to list values for each column.
- data: List of strings that will be joined together.
def escape_sql_characters(sql_str: str) -> str
Given a string, it will put a \
character for each character that
could cause some trouble in you SQL string. It's important if an user is
called Claire O’Connell
for an example, the '
cound couse some syntax
error -- also, it's important to avoid SQL injection.
- sql_str: The string that you want to escape the characters;
def format_sql_row(row: list[str]) -> list[str]
The most complicated function, this function creates a list with strings that is compatible with SQL syntax. For an example: it replaces a "Foo" string for "E'Foo'" -- with special characters escaped -- and a "1" string to just "1". Maybe you'll need to check the test cases or the source code in order to understand that function well...
Note
For an example: If you give an ["52", "Rice", "20.7", "TRUE"]
list, it
will return a list that looks like ["52", "E'Rice'", "20.7", "TRUE"]
- row: Is the list of strings that you want to format to be compatible with SQL syntax.
def get_sql_slices(file_path: str, dlmtr: str = ",")
Utility function that opens the file and parses the header and column values string slices that are compatible with the SQL syntax.
- file_path: Path string to access the file contents;
- dlmtr: This is
,
by default, but you can set a custom delimiter if your file is formatted in a different way.
def get_insert_query(file_path: str, dlmtr: str = ",") -> str
Open the specified file to format a SQL statement that inserts every
row values on the .csv
file into the table that has the same name of the
.csv
file. Also, it checks for erros when opening the file, if something
goes wrong, it will exit the script with status 1.
- file_path: Path string to access the file contents;
- dlmtr: This is
,
by default, but you can set a custom delimiter if your file is formatted in a different way.
def connect_and_send(host: str, port: int, user: str, password: str,
db_name: str, insert_query: str) -> None
This is the most important function. By default, it uses the database authentication information provided in the command line arguments to connect to a PostgreSQL database and send that query. If you want this script to work with different databases, maybe you would like to edit the source code for that, and that's the function that you're looking for.
- host: Hostname of your database server;
- port: Port for the connection to your database server;
- user: Username to access the database tables;
- password: Password of your database user;
- db_name: The database this script should access once it's connected;
- insert_query: Query that this code generates by reading a
.csv
file.
def main(args: list[str]) -> None
Main function that parses command line arguments, logs information, generates SQL insert queries from CSV files, and sends them to a database. The function works as follows:
- Parses the command line arguments;
- Iterates over each CSV file path in the parsed arguments;
- Generates an SQL insert query for the current CSV file;
- If the
print
flag is set in the parsed arguments, it prints the insert query and breaks the loop; and - Connects to the database and sends the insert query.
- args: List of command line arguments.