posted on Sun, Jul 21 '24 under tag: how-to

Manoj messaged me the other day saying how he was unable to upload a file to the “FromThePage” instance of Sahya. I started investigating it after 4 days.

I couldn’t find any error in the application log. And then Manoj told me that the uploads were indeed complete, but they were stuck in “Processing” in the queue. So I went and looked at the log of one, and it was like this:

** Invoke fromthepage:process_document_upload (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute fromthepage:process_document_upload
fetching upload with ID=19
found document_upload for 
	user=Manojk, 
	target collection=KNH, 
	file=/uploads/document_upload/file/19/KNH000004.pdf
creating temp directory /tmp/fromthepage_uploads/19
copying /var/www/fromthepage/public/uploads/document_upload/file/19/* to /tmp/fromthepage_uploads/19
unzip_tree(/tmp/fromthepage_uploads/19)
	unzip_tree considering /tmp/fromthepage_uploads/19/KNH000004.pdf
	unzip_tree considering /tmp/fromthepage_uploads/19/process.log
unpdf_tree(/tmp/fromthepage_uploads/19)
	unpdf_tree considering /tmp/fromthepage_uploads/19/KNH000004.pdf)
		unpdf_tree Found pdf /tmp/fromthepage_uploads/19/KNH000004.pdf
		gs -r300x300 -dJPEGQ=30 -o '/tmp/fromthepage_uploads/19/KNH000004/page_%04d.jpg' -sDEVICE=jpeg '/tmp/fromthepage_uploads/19/KNH000004.pdf'
GPL Ghostscript 9.55.0 (2021-09-27)
Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 2.
Page 1
Page 2
		unpdf_tree Extracted to /tmp/fromthepage_uploads/19/KNH000004
		unpdf_tree No metadata file exists at /tmp/fromthepage_uploads/19/metadata.yml
	unpdf_tree considering /tmp/fromthepage_uploads/19/process.log)
convert tiffs from tree(/tmp/fromthepage_uploads/19)
	untiff_tree considering /tmp/fromthepage_uploads/19/KNH000004)
Found directory /tmp/fromthepage_uploads/19/KNH000004
convert tiffs from tree(/tmp/fromthepage_uploads/19/KNH000004)
	untiff_tree considering /tmp/fromthepage_uploads/19/KNH000004/page_0001.jpg)
	untiff_tree considering /tmp/fromthepage_uploads/19/KNH000004/page_0002.jpg)
	untiff_tree considering /tmp/fromthepage_uploads/19/KNH000004.pdf)
	untiff_tree considering /tmp/fromthepage_uploads/19/process.log)
compress tree(/tmp/fromthepage_uploads/19)
compress_tree handling /tmp/fromthepage_uploads/19/KNH000004)
Found directory /tmp/fromthepage_uploads/19/KNH000004
compress tree(/tmp/fromthepage_uploads/19/KNH000004)
compress_tree handling /tmp/fromthepage_uploads/19/KNH000004/page_0001.jpg)
Found image /tmp/fromthepage_uploads/19/KNH000004/page_0001.jpg
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 12884403 at quality 90"
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 10553270 at quality 80"
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 8526727 at quality 70"
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 8201708 at quality 60"
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 7735076 at quality 50"
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 6464711 at quality 40"
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 6197795 at quality 30"
/var/www/fromthepage/lib/image_helper.rb:81: warning: passing a block without an image argument is deprecated
"Compressed file is now 5738645 at quality 20"
compress_tree handling /tmp/fromthepage_uploads/19/KNH000004/page_0002.jpg)
Found image /tmp/fromthepage_uploads/19/KNH000004/page_0002.jpg
rake aborted!
Magick::ImageMagickError: width or height exceeds limit `/tmp/fromthepage_uploads/19/KNH000004/page_0002.jpg' @ error/cache.c/OpenPixelCache/3909
/var/lib/gems/3.0.0/gems/rmagick-4.2.6/lib/rmagick_internal.rb:1593:in `read'
/var/lib/gems/3.0.0/gems/rmagick-4.2.6/lib/rmagick_internal.rb:1593:in `block in initialize'
/var/lib/gems/3.0.0/gems/rmagick-4.2.6/lib/rmagick_internal.rb:1592:in `each'
/var/lib/gems/3.0.0/gems/rmagick-4.2.6/lib/rmagick_internal.rb:1592:in `initialize'
/var/www/fromthepage/lib/image_helper.rb:80:in `new'
/var/www/fromthepage/lib/image_helper.rb:80:in `block in compress_image'
/var/www/fromthepage/lib/image_helper.rb:77:in `downto'
/var/www/fromthepage/lib/image_helper.rb:77:in `each'
/var/www/fromthepage/lib/image_helper.rb:77:in `compress_image'
/var/www/fromthepage/lib/tasks/ingestor.rake:155:in `block in compress_tree'
/var/www/fromthepage/lib/tasks/ingestor.rake:147:in `each'
/var/www/fromthepage/lib/tasks/ingestor.rake:147:in `compress_tree'
/var/www/fromthepage/lib/tasks/ingestor.rake:151:in `block in compress_tree'
/var/www/fromthepage/lib/tasks/ingestor.rake:147:in `each'
/var/www/fromthepage/lib/tasks/ingestor.rake:147:in `compress_tree'
/var/www/fromthepage/lib/tasks/ingestor.rake:63:in `process_batch'
/var/www/fromthepage/lib/tasks/ingestor.rake:35:in `block (2 levels) in <main>'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/task.rb:279:in `block in execute'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/task.rb:279:in `each'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/task.rb:279:in `execute'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/task.rb:219:in `block in invoke_with_call_chain'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/task.rb:199:in `synchronize'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/task.rb:199:in `invoke_with_call_chain'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/task.rb:188:in `invoke'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:160:in `invoke_task'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:116:in `block (2 levels) in top_level'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:116:in `each'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:116:in `block in top_level'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:125:in `run_with_threads'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:110:in `top_level'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:83:in `block in run'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:186:in `standard_exception_handling'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/lib/rake/application.rb:80:in `run'
/usr/share/rubygems-integration/all/gems/rake-13.0.6/exe/rake:27:in `<top (required)>'
/usr/bin/rake:25:in `load'
/usr/bin/rake:25:in `<main>'
Tasks: TOP => fromthepage:process_document_upload

Magick::ImageMagickError: width or height exceeds limit seems to be the line that we need to look at. Searching for that I could find this issue reported in fromthepage repository itself, pointing to a discussion that talks about how policy.xml of ImageMagick sets the maximum width of a file that it processes.

I checked the file on the server and found the relevant line like this:

  <policy domain="resource" name="width" value="16KP"/>

16KP. I thought the file wouldn’t be wider than that. So I went to verify.

❯ pdfimages -list KNH000004.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2250  2823  rgb     3   8  jpeg   no         4  0    72    72  946K 5.1%
   2     1 image    4000  2250  rgb     3   8  jpeg   no         7  0    72    72 1471K 5.6%

The maximum width was 4000 pixel in the second page. Why would it then exceed 16KP (which presumably means 16 * 1000 pixels).

So I went on the server to inspect the actual files.

$ cd /tmp/fromthepage_uploads/19/KNH000004
$ ls
page_0001.jpg  page_0002.jpg
$ identify page_0002.jpg 
page_0002.jpg JPEG 9375x16667 9375x16667+0+0 8-bit sRGB 9.23583MiB 0.000u 0:00.000

What, this was 16667 pixels wide! It seems like there is some processing step which increases the size.

The line gs -r300x300 -dJPEGQ=30 -o '/tmp/fromthepage_uploads/19/KNH000004/page_%04d.jpg' seemed like it might be doing this. I tried running this.

❯ gs -r300x300 -dJPEGQ=30 -o './page_%04d.jpg' -sDEVICE=jpeg './KNH000004.pdf'
GPL Ghostscript 10.03.1 (2024-05-02)
Copyright (C) 2024 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 2.
Page 1
Page 2

❯ identify page_0002.jpg 
page_0002.jpg JPEG 9375x16667 9375x16667+0+0 8-bit sRGB 9.23583MiB 0.000u 0:00.000

Sure enough, this was the issue.

So I edited the policy.xml to:

  <policy domain="resource" name="width" value="32KP"/>

Now I deleted the stuck uploads and uploaded the file again. It failed again!

I tried to manually convert the image and it still failed. So I changed the values surrounding like

  <policy domain="resource" name="memory" value="2GiB"/>
  <policy domain="resource" name="map" value="2GiB"/>
  <policy domain="resource" name="area" value="512MP"/>
  <policy domain="resource" name="disk" value="2GiB"/>

Now it worked!

Like what you are reading? Subscribe (by RSS, email, mastodon, or telegram)!