Skip to content

Script that reads a CSV file and constructs a SQL query string that inserts every row of this file into an SQL database -- you can also connect to an PostgreSQL database (by default) and actually run that generated insert command

License

Notifications You must be signed in to change notification settings

kevinmarquesp/csv_to_sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSV To SQL Script

Script that reads a CSV file and constructs a SQL query string that inserts every row of this file into an SQL database -- you can also connect to an PostgreSQL database (by default) and actually run that generated insert command.

I was studying SQL and I needed to pass an example CSV file to my database in order to do some operations and study the behavior of the SQL syntax, so I thought that it would be a really good quick project to rest a bit and let all this SQL stuff for later. Probably, this script isn't the best solution for that problem, so if you want a proper tool to do that, maybe you'll need to do a little research. But, by the way, if this little side project is useful to you, I would be happy.

The thing that I was studying with this project is to find a good and simple Python project structure that is easy to use/install and test. This happens to be a really simple command line tool, so it's better to hold the entire code base in a single file, so the user could just copy that source code and use it as an app -- the challenge was to make a testable command line tool that doesn't require installing any libraries, just using what Python already offers me by default.

project_script-to-convert-csv-file-into-sql-command-2024-01-20_19.36.02.mp4

Installation Process

As I said, the important file is the csv_to_sql/csv_to_sql.py file, so the way to go is just copy and paste the contents of this file in somewhere on your $PATH. You can just select the text from your browser, open your text editor and paste its contents, or you can just follow the steps below:

git clone https://github.com/kevinmarquesp/csv_to_sql
cp csv_to_sql/csv_to_sql/csv_to_sql.py ~/.local/bin/csv_to_sql
rm -rf csv_to_sql  #removes the cloned repository

Uninstall Process

To uninstall you'll need to remove this file from your system, so it depends on where you've installed it in the first place. But you followed the recommended steps to install in this document, you can uninstall with:

rm ~/.local/bin/csv_to_sql

Development & Contribution

This code should not have any external dependencies that the user should use python3 -m pip to install, the objective is to be plug and play tool. Also, I think everything becomes a lot easier -- for the user, at least -- if all the code base is condensed into a single file, which could be a challange to mantain, it's important to write good and readable code for that.

I'm using a simple "pythonic" styleguide, the only constraint is that each line should not be longer than 80 characters and that I'm using double quotes ("") as the main characters to denote a string -- that's because english has a bunch of ' in its grammar, and escaping each of them is just too annoying.

Tip

Every time you update the docstrings of the csv_to_sql.py file, you can run the build_docs.sh script to automatically update the contents of this README.md file. But you'll have to install the pydoc-markdown library from pip; its a command line utility that generates a Markdown documentation based on the docstrings inside a file/module.

About the tests, it has a similar philosophy. There is a single test script -- the test.py file -- that uses the default unittest library. You can run these tests with: python3 test.py. Also, it's a good idea to be creative and do something to run this tests constantly, you can use a watch command to run every time the csv_to_sql/csv_to_sql.py is modified or something like that, I personally likes to put it in a while sleep loop.

Code's Documentation (autogenerated)

def log(msg: str)

Simple function that replaces some special characters to their respective color codes, like replacing an [g], which means "green", to \033[32m of the string, then printing it on the screen.

  • msg: The message that you want to show on the terminal output.
def parse_arguments(args: list[str]) -> Namespace

Given an list of arguments -- e.g. ['argument', '-o', '--option', 'optton_value'] -- this function puts everything together and returns a Namespace object with all that arguments parsed to Python's types and easily accessible with the parsed_args.option notation.

  • args: List of arguments that the user has specified, maybe you would like to use it with the sys.argv list to access the command line arguments.
def join_list_format_sql(data: list[str]) -> str

Utility function that joins every instance of a list with a , character, and put that string between (). Creating a string that is compatible with SQL syntax, you can use it to select columns in a table or to list values for each column.

  • data: List of strings that will be joined together.
def escape_sql_characters(sql_str: str) -> str

Given a string, it will put a \ character for each character that could cause some trouble in you SQL string. It's important if an user is called Claire O’Connell for an example, the ' cound couse some syntax error -- also, it's important to avoid SQL injection.

  • sql_str: The string that you want to escape the characters;
def format_sql_row(row: list[str]) -> list[str]

The most complicated function, this function creates a list with strings that is compatible with SQL syntax. For an example: it replaces a "Foo" string for "E'Foo'" -- with special characters escaped -- and a "1" string to just "1". Maybe you'll need to check the test cases or the source code in order to understand that function well...

Note

For an example: If you give an ["52", "Rice", "20.7", "TRUE"] list, it will return a list that looks like ["52", "E'Rice'", "20.7", "TRUE"]

  • row: Is the list of strings that you want to format to be compatible with SQL syntax.
def get_sql_slices(file_path: str, dlmtr: str = ",")

Utility function that opens the file and parses the header and column values string slices that are compatible with the SQL syntax.

  • file_path: Path string to access the file contents;
  • dlmtr: This is , by default, but you can set a custom delimiter if your file is formatted in a different way.
def get_insert_query(file_path: str, dlmtr: str = ",") -> str

Open the specified file to format a SQL statement that inserts every row values on the .csv file into the table that has the same name of the .csv file. Also, it checks for erros when opening the file, if something goes wrong, it will exit the script with status 1.

  • file_path: Path string to access the file contents;
  • dlmtr: This is , by default, but you can set a custom delimiter if your file is formatted in a different way.
def connect_and_send(host: str, port: int, user: str, password: str,
                     db_name: str, insert_query: str) -> None

This is the most important function. By default, it uses the database authentication information provided in the command line arguments to connect to a PostgreSQL database and send that query. If you want this script to work with different databases, maybe you would like to edit the source code for that, and that's the function that you're looking for.

  • host: Hostname of your database server;
  • port: Port for the connection to your database server;
  • user: Username to access the database tables;
  • password: Password of your database user;
  • db_name: The database this script should access once it's connected;
  • insert_query: Query that this code generates by reading a .csv file.
def main(args: list[str]) -> None

Main function that parses command line arguments, logs information, generates SQL insert queries from CSV files, and sends them to a database. The function works as follows:

  1. Parses the command line arguments;
  2. Iterates over each CSV file path in the parsed arguments;
  3. Generates an SQL insert query for the current CSV file;
  4. If the print flag is set in the parsed arguments, it prints the insert query and breaks the loop; and
  5. Connects to the database and sends the insert query.
  • args: List of command line arguments.

About

Script that reads a CSV file and constructs a SQL query string that inserts every row of this file into an SQL database -- you can also connect to an PostgreSQL database (by default) and actually run that generated insert command

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published