Clean up all temp files created during processing and storage#285
Clean up all temp files created during processing and storage#285dmarkow wants to merge 1 commit intostavro:masterfrom
Conversation
bee1af1 to
657fcb0
Compare
657fcb0 to
5ac609e
Compare
|
|
||
| # If this version was transformed in any way, a tempfile was created | ||
| # that we need to clean up. | ||
| if file.tempfile? do |
There was a problem hiding this comment.
It looks like this is the same code as in cleanup/2. I would suggest replacing this (and the ending return of result) with the function call cleanup(result, file) to minimize unnecessary code duplication.
|
|
||
| def new(%{filename: filename, binary: binary}) do | ||
| %Arc.File{binary: binary, file_name: Path.basename(filename)} | ||
| |> write_binary() |
There was a problem hiding this comment.
I would suggest keeping new/1 without side-effects. Although this might be a contrived example, imagine a situation where someone is transferring a very large file over the network and they are not using Plug.Upload or any related mechanism to do so. Now, they could store the filename and binary data as part of a GenServer state, but they might also want to store a copy of Arc.File instead, since it already contains the filename and binary data in its structure.
And sure, someone could easily create just use %Arc.File{...} to create a structure without side-effects, but I don't think new/1 would be an obvious place for a side-effect. Also imagine trying to test some code that requires an Arc.File struct but does not require an actual file to exist, or maybe can't write a file for some reason; this would have negative impacts on those test cases where some other flow attempts to use new/1 and the whole thing breaks even though the app code never directly called new/1.
Fixes #256
Arc creates temp files in the directory that
System.tmp_dirreturns, but never cleans up after itself. Running unchecked on a production server could result in running out of disk space. These temp files are created in three scenarios:Scenarios 1 and 2 seem very similar (converting something that isn't a local file path into a local file path), however scenario 1 was happening once per transformation (where 10 versions/transformations meant the binary data would be written to disk 10 separate times), whereas scenario 2 was happening once right at the beginning (the URL is retrieved and saved to disk immediately during
Arc.File.new/1).Scenario 3 creates a temp file for each transformation.
Changes:
Arc.File.new/1rather than once per transformation (the tradeoff is that even with no transformations, it'll still write once, but this matches how remote URLs are handled)Arc.File.ensure_pathas it was only there to convert binary data to a path which now happens automaticallytempfile?attribute to%Arc.File{}that can be used later for cleanup (I preferred this over just testing the path to see if it's in the temp dir, otherwise there's a chance of deleting the input file itself if it was also in the temp dir)