
- #SQLITE INSERT FULL#
- #SQLITE INSERT CODE#
Just like Python, I wrote a naive Rust version where I inserted each row in a loop. Both have very different goals and places in your toolkit.) Rust (Note: This is NOT a speed comparison post between Python and Rust. I decided to move from Python and experiment further in Rust. I rewrote the same in Rust, the loop took only 17 seconds.
The batched version took 1.5 minutes in PyPy (again a 3.5x speed bump). The batched version took 5.5 minutes in CPython. So I removed the SQL instructions and ran the code: I wanted to get some idea of how much time Python is spending just in loops. (I am not affiliated with PyPy but I would request you consider donating to PyPy for their efforts.) Busy Loop(?) The batched version took only 2.5 minutes to insert 100M rows. It worked and the speed bump was phenomenal. #SQLITE INSERT CODE#
I was also wondering if I had to make changes to make it run, however, my existing code ran smoothly.Īll I had to do was run my existing code, without any change, using PyPy. I have never used PyPy and PyPy homepage highlight that it is 4x faster than CPython, I felt this is a good opportunity to try it and test out their claims. The batched version took about 8.5 minutes to insert 100M rows.The naive for loop version took about 10 minutes to insert 100M rows.I rewrote the Python script again, this time including the fine-tuned SQLite parameters which gave a huge boost and the running time was reduced drastically. Here are some of the articles I read on the internet which helped me with these optimisation parameters: 1, 2, 3, 4, 5. I haven’t tried all of them, the ones I selected provided a decent running time.
#SQLITE INSERT FULL#
The SQLite docs have a full page dedicated on these parameters, they also list a bunch of other parameters.
Setting temp_store to MEMORY will make it behave like an in-memory database. In EXCLUSIVE locking mode, the lock held by the SQLite connection is never released. Do not set this to a high value in production. The cache_size specifies how many memory pages SQLite is allowed to hold in the memory. A write to SQLite, may not mean it is flushed to the disk. By turning off synchronous, SQLite does not care about writing to disk reliably and hands off that responsibility to the OS. This disables the atomic commit and rollback capabilities of SQLite. Turning off journal_mode will result in no rollback journal, thus we cannot go back if any of the transactions fail. The internet is filled with many SQLite optimisation posts. The next logical step was to look for database optimisations and I started diving into the amazing world of SQLite. Secondly, I wanted the code to be simple and near to the daily usage version. The script I had written is very simple, so I assumed there isn’t much room for optimisation. Here is the full code SQLite Optimisations With this simple change, the running time was reduced to 10 minutes. I tried different sizes of batch inserts, found out 100,000 to be a sweet spot. Each transaction guarantees that it is written to disk thus could be slow. In SQLite, each insertion is atomic and is a transaction. This version took close to 15 minutes, sparked my curiosity and made me explore further to reduce the time. In this script, I tried to insert 10M rows, one by one, in a for loop. The standard library provides a nice SQLite module, using which I wrote my first version. Python is my go to language for any kind of scripting. No need of using true random methods, pseudo-random methods from stdlib are just fine.
It may use my machine resources to the fullest: 100% CPU, 8GB Memory and gigabytes of SSD space.That is, it is fine if the process crashes and all the data is lost. I don’t need the durability guarantee.
The machine I am using is MacBook Pro, 2019 (2.4 GHz Quad Core i5, 8GB, 256GB SSD, Big Sur 11.1)Īspects I was willing to compromise on were: The area column would hold six digits area code (any six digits would do, no validation). The generated data would be random with following constraints: