So I have a scientific data Excel file validation form in django that works well. It works iteratively. Users can upload files as they accumulate new data that they add to their study. The DataValidationView
inspects the files each time and presents the user with an error report that lists issues in their data that they must fix.
We realized recently that a number of errors (but not all) can be fixed automatically, so I've been working on a way to generate a copy of the file with a number of fixes. So we rebranded the "validation" form page as a "build a submission page". Each time they upload a new set of files, the intention is for them to still get the error report, but also automatically receive a downloaded file with a number of fixes in it.
I learned just today that there's no way to both render a template and kick off a download at the same time, which makes sense. However, I had been planning to not let the generated file with fixes hit the disk.
Is there a way to present the template with the errors and automatically trigger the download without previously saving the file to disk?
This is my form_valid
method currently (without the triggered download, but I had started to do the file creation before I realized that both downloading and rendering a template wouldn't work):
def form_valid(self, form):""" Upon valid file submission, adds validation messages to the context of the validation page.""" # This buffers errors associated with the study data self.validate_study() # This generates a dict representation of the study data with fixes and # removes the errors it fixed self.perform_fixes() # This sets self.results (i.e. the error report) self.format_validation_results_for_template() # HERE IS WHERE I REALIZED MY PROBLEM. I WANTED TO CREATE A STREAM HERE # TO START A DOWNLOAD, BUT REALIZED I CANNOT BOTH PRESENT THE ERROR REPORT # AND START THE DOWNLOAD FOR THE USER return self.render_to_response( self.get_context_data( results=self.results, form=form, submission_url=self.submission_url, ) )
Before I got to that problem, I was compiling some pseudocode to stream the file... This is totally untested:
import pandas as pdfrom django.http import HttpResponsefrom io import BytesIOdef download_fixes(self): excel_file = BytesIO() xlwriter = pd.ExcelWriter(excel_file, engine='xlsxwriter') df_output = {} for sheet in self.fixed_study_data.keys(): df_output[sheet] = pd.DataFrame.from_dict(dfs_dict[sheet]) df_output[sheet].to_excel(xlwriter, sheet) xlwriter.save() xlwriter.close() # important step, rewind the buffer or when it is read() you'll get nothing # but an error message when you try to open your zero length file in Excel excel_file.seek(0) # set the mime type so that the browser knows what to do with the file response = HttpResponse(excel_file.read(), content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet') # set the file name in the Content-Disposition header response['Content-Disposition'] = 'attachment; filename=myfile.xlsx' return response
So I'm thinking either I need to:
- Save the file to disk and then figure out a way to make the results page start its download
- Somehow send the data embedded in the results template and sent it back via javascript to be turned into a file download stream
- Save the file somehow in memory and trigger its download from the results template?
What's the best way to accomplish this?
UPDATED THOUGHTS:
I recently had done a simple trick with a tsv
file where I embedded the file content in the resulting template with a download button that used javascript to grab the innerHTML
of the tags around the data and start a "download".
I thought, if I encode the data, I could likely do something similar with the excel file content. I could base64 encode it.
I reviewed past study submissions. The largest one was 115kb. That size is likely to grow by an order of magnitude, but for now 115kb is the ceiling.
I googled to find a way to embed the data in the template and I got this:
import base64with open(image_path, "rb") as image_file: image_data = base64.b64encode(image_file.read()).decode('utf-8')ctx["image"] = image_datareturn render(request, 'index.html', ctx)
I recently was playing around with base64 encoding in javascript for some unrelated work, which leads me to believe that embedding is do-able. I could even trigger it automatically. Anyone have any caveats to doing it this way?