Get Folder Contents for Very Large Folder

recorder-wf_studio_s
question_s
files-and-folders
get-folder-content
#1

Hi,

I’m building a process to read files from a large folder (280k+ files). It should be looking for a specific file daily, but I will never have the complete file name, only a partial name. Get folder contents with a regex name filter works great for smaller folders, but with this many it times out and fails.

Is there another way to read a specific file when all I have is the folder and a partial file name, custom script maybe? The partial file name will result in a unique file, fyi.

Any help is appreciated, thanks!

#2

Hi @wwylie.

Under “partial file name” do you mean some file name mask, for example, as Test_file_[MMDDYYYY].txt? How does partial file name look like?

#3

The file names include a date as well as an identifier, but also random text
(ex MMDDYYYY_AIDKTHE1823AHD_TEST_FILE_SUEJ9834_97834)

So here, I would be looking for MMDDYYYY*TEST_FILE* with “*” being any text.

#4

Thank you. And one more question, is this file the latest modified file by date or not? If it’s the latest modified file, you can try to use this custom script sample: https://kb.workfusion.com/display/RPAe/Code+Samples#CodeSamples-FindinglastmodifiedfileYellowWorkfusion

#5

Unfortunately no, it will not be the latest modified.

#6

Hi @wwylie,

Please find the sample Custom Action script which is working fine in large number of files in folder,

Avoid Regex Filter in Get Folder Content Activity and thus perform the script as shown in below image,

Samples.zip (249.2 KB)

Hope this resolves your problem and Have a great day :slight_smile:

2 Likes
#7

Thank you for the response. However, my issue is getting Get Folder Contents action to work at all - your example still uses get folder contents to create a list of all the files, that is where my process is timing out. It’s a network drive with 290k+ files which takes a while to load just by opening in file explorer.

#8

Hi @wwylie.

I’ve created small script that reads files and compares with mask (thanks @aravindhan_mr for regex :slight_smile: ) returning the List variable. It works for me, but I tested with small folder with 20 files. Recorder variables in Studio: folder - String and file_list - List.
Please see below:

import com.workfusion.studio.rpa.recorder.utils.ListComparator

@CustomScriptAction(
   input = ['folder'],
   output = 'file_list'
)

def customScript() {                     
         
       
	   def File[] files = new File(folder.toString()).listFiles()
	   def ArrayList<String> temp_list = new ArrayList<String>()
	   	   
	   for (File file: files) 
	   {
	     if (file.isFile()) {        
	        
	          if (file.getName().matches("[0-9]{1,}+\\_[a-zA-Z0-9_]{1,}+\\.[tx]{3}")) 
	          {
	            temp_list.add(file.getName().toString())
	          }
	        
	       }	      	       
	   }  
	   file_list = RList.of(temp_list.asList())
} 
1 Like