Todor Bogosavljević

2024/08/25

Building a minimal file upload service

Programming Networking Software

Backstory

A long time ago I learned about a service called 0x0.st. It was an incredibly cool file uploading service to me, since it had no web interface and you would just shoot up files to it using curl, in return getting a URL back. Knowledge about this site sat in my head for years, and I didn’t really utilize it that much. But recently during my work day, I needed to transfer a file from a remote machine to my local computer. Usually this is a job for scp, but the odd set of curciumstances dissalowed this - as I did not know the IP of the machine I was connecting to, and I gained a shell using a third-party tool.

After considering running a SSH server on my local machine, and doing the file transfer like that - I realized i could just upload the file to 0x0.st and download it on my local machine… Well I could do that, but that service dissallows CI/CD artifacts, and being that I am currently employed with a company in the CI/CD space, I couldn’t do this.

I quickly found an alternative; x0.at, and my immediate problem was solved. But looking through the source code of this project, I noticed it was only ~330 lines of PHP code! This gave me the idea of building my own file hosting service, and I ended up deciding on Python to do this.

Why Python?

I opted for Python as it is very quick to develop in, but it has some downsides. The biggest downside is that it is very resource intensive - the file upload service uses 222MB of RAM! To put this into perspective, my whole self-hosted git site is currently using 73MB of RAM.

That being said, the biggest upside of Python is that I could make the whole service very quickly, and in much fewer lines of code.

Overview

A file hosting service is at it’s heart very simple. You get a file, which then you store to be accessed later. These files can be stored in a multitude of ways, but I opted to just store them in a ./uploads folder. To access them at a later point - I ended up using URLs with keys such as: https://up.x1b.dev/23674f.

The keys that would correspond to the files were first simply stored in a JSON, but I switched over to glorius SQlite3 to future-proof it at least a bit.

Implementation

Now, implementing all this in just under a 100 lines of code is easier said than done - even with Python. This is especially because I opted for a SQL database, which features quite a verbose way to read and write. I had to cut down on comments, and make queries into a single line to hit the target goal.

Let’s take a look at the main upload_file function:

@app.route('/', methods=['POST'])
def upload_file():
    file = request.files.get('file')
    if not file or file.content_length > MAX_FILE_SIZE:
        return "No file uploaded or file too large\n", 400
    
    key = uuid.uuid4().hex[:6]
    filepath = os.path.join(UPLOAD_FOLDER, f"{key}_{file.filename}")
    file.save(filepath)
    
    data[key] = {'type': 'file', 'path': filepath, 'expiry': time.time() + EXPIRATION_TIME}
    save_data()
    return f"{URL_PREFIX}{key}\n"

You can already see some jankyness in an effort to lower the line count, such as removing different types of errors for one error that covers both cases. But let’s go a bit back:

This function is very simple, it “listens” on the base URL for POST requests including a file, upon which it takes the file and generates a 6 character key which gets written to the database and the upload directory. If we upload a test.pdf, it will look something like this:

82kz12_test.pdf

Pretty simple and elegant. In the database it would look something like:

key type path url expiry
82kz12 file ./uploads/82kz12_test.pdf null 1725030186.8701632

Now the keen-eyed amongst you might have spotted something interesting: this database structure is a crime to humanity, but some sacrifices had to be made to fit everything under the 100 line limit.

Also, what is that “expiry” row? Let’s take a look at the save_data() function:

def save_data():
    with sqlite3.connect(DATA_FILE) as conn:
        conn.execute('DELETE FROM files')
        conn.executemany('INSERT INTO files VALUES (?, ?, ?, ?, ?)',
                         [(k, v['type'], v.get('path'), v.get('url'), v['expiry']) for k, v in data.items()])

This is the function that actually writes to the database. It features a bit more of the SQL magic I mentioned earlier, but all-in-all, it’s not too bad. This function writes to the following fields in the database table: type, path, url, and expiry. They are self-explanatory, but let’s go over them just in case you find this a bit confusing:

Let’s talk more about expiry:

Expiry & Threading

There are many possible ways to handle files expiring and being deleted. From storing the actual expiry date somewhere in the file name (kinda stupid, but this whole project is basically), to the one I opted to use: Python threads!

Now, if you ever worked with any kind of threading / async stuff, you may be dreading what comes next - but I am happy to report that you wont have any issues with synchronization or such while deploying this project.

Why? It’s pretty simple. No - I mean the implemenation is simple. Take a look:

def cleanup_files():
    while True:
        time.sleep(3600)
        now = time.time()
        expired_keys = [k for k, v in data.items() if v['expiry'] < now]
        for k in expired_keys:
            if data[k]['type'] == 'file':
                os.remove(data[k]['path'])
            data.pop(k)
        save_data()

This function basically waits 5 minutes, and checks if a file should be deleted. It does this by comparing the stored expiry time, with the current time - and if the expiry date is less than the current time, it calls the os.remove() function to remove that file from the face of the Earth.

Now, this in itself would dead-lock the project, as no other code could run while this was happening, but the only place this function is called is in the line right after this function:

threading.Thread(target=cleanup_files, daemon=True).start()

Now, God bless the Python developers for providing such a simple and elegant way to handle threads. This line basically creates a new process, whose job is to check if/when the file should be deleted. If you don’t utilize threads in your Python project, please look into them! They are useful in a wide-range of circumstances, and you’d be surprised at the performance benefits your project could gain from these. (Not that anyone uses Python for performance)

Deployment & Security

The security concious readers may have developed a certain unease, and started sweating from the get-go as soon as they started reading this blog post. Public file uploads, while simple - incur a huge security risk. Just allowing anyone to upload anything to your server is what we in the industry call: a dumb idea.

I have mitigated this by adding a new Linux user, with only the permissions to access the ./uploads directory. This should mitigate the issues which are caused by overly-permissive users (let’s say my personal user account, with sudo priveledges) to access files, and run commands which they do not have the permissions for.

In conclusion

This has been a fun quick little project, and I ended up getting something useful out of it! Perhaps in the future I may get the dumb idea to re-create this project in C, to see how resource-efficient we can get it. But for now, I am pretty happy where we ended up.

Feel free to try it out over at my own host. And check out the source code here.

Thank you for reading, and as always - have a wonderful day!