Cells purging question

Hey, we’ve been trialling Cells for a while now and have noticed files are not being purged. The retention policy is set as “30 days max”. The documentation doesn’t say I need to setup a cron job as I had to do for Pydio v8 so I assumed it would do its thing automatically. Am I missing something?

Hello @scott.bentley,

If you are referring to the versioning, then you should have only the last 30 versions of a file(one for each 30 previous days).

To make sure that I understood your request, You wish for the files that are 30 days old to be Deleted forever from your Cells?

Thanks @zayn,

Yes, we want the file to be permanently deleted. We do not use the product as a file management tool and specifically do not want users storing tons of documents forever. Our use-case is to provide file transfer and limited collaboration between staff and clients. Purging files regularly is necessary to (a) relieve staff from having to remember to do it themselves and (b) ensure we aren’t using any more disk space than absolutely necessary for our use-case purposes.

Hey @zayn, just wanted to check in on this and see if you have anything else to add. Is purging a feature that is likely to be implemented in the future? Should I attempt to write a plugin of some sort? Where would I start if I wanted to write a plugin that would perform file purging?

Thanks,

Scott

A plugin development guide with some boiler-plate code would be, indeed, very nice to have.

One way would be to use the sdk to create a custom task that deletes (ran with a cron or systemd),

The go-sdk, allows you to CRUD the resources therefore allowing you to list and delete data. (you could list data and check if it is old by X days then proceed to delete).

If you are stuck, tell me and I’ll write a snippet for you.

You could also script something with the cells-client

Thanks Zayn. I was looking into this and then COVID hit. I’m back looking at it again now though!

So, I installed the cec client and wanted to list all existing cells but I don’t see how to do this? If I run “cec ls” I only get the workspaces and cells of the authenticated user (admin, in this case). I need a listing of ALL cells and/or files so that I can then delete those older than 30 days. Would be able to provide me with a snippet that might accomplish this?

Thank you so much, and I hope you and your loved ones have been well throughout this pandemic!

Scott

Hey @zayn , I don’t want to be pushy and I understand if there’s more pressing issues you need to respond to, however if there is an example of using the cec to find and remove files from accounts other than the logged in account, could you please point it out to me?

Otherwise, can you please advise how I can find files that are older than a number of days and administratively clear them for ALL accounts?

Also, a related question, how do I force an expiry on all public shares so the user cannot create shares without at least a minimum expiry of, for example, 60 days?

Hello @scott.bentley,

here is a go snippet to help you start, what it does is list nodes with the AdminTreeList (which will allow you to see all the nodes ), I have added comments to point where you have to add the functions that you need to.

package main

import (
	"log"
	"path/filepath"
	"strconv"
	"time"

	cells_sdk "github.com/pydio/cells-sdk-go"
	"github.com/pydio/cells-sdk-go/client/admin_tree_service"
	"github.com/pydio/cells-sdk-go/example/cmd"
	"github.com/pydio/cells-sdk-go/models"
)

var (
	config = &cells_sdk.SdkConfig{
		Url:        "https://my-cells.com",
		ClientKey:  "cells-front",
		User:       "admin",
		Password:   "",
		SkipVerify: false,
	}
)

func main() {
	ctx, cli, err := cmd.GetApiClient(config)
	if err != nil {
		log.Fatalf("Could not GetApiCLient, cause: %v\n", err)
	}

	params := &admin_tree_service.ListAdminTreeParams{Body: &models.TreeListNodesRequest{
		// Lists all the nodes under the personal datasource (meaning, personal-files/admin, personal-files/johndoe, etc...)
		Node: &models.TreeNode{Path: "personal"},
		Recursive: true,
	}, Context: ctx}

	// ListAdminTree lists all the nodes
	result, err := cli.AdminTreeService.ListAdminTree(params)
	if err != nil {
		log.Fatal(err)
	}

	for _, n := range result.Payload.Children {

		// ignores .pydio files
		if filepath.Base(n.Path) == ".pydio" {
			continue
		}

		// Parse and convert files MTime
		i, _ := strconv.ParseInt(n.MTime, 10, 64)
		tu := time.Unix(i, 0)
		d := time.Since(tu)

		// Checks if duration is older than 30 days
		if d.Hours() > float64(time.Hour * 24 * 30) {
			// add function to delete the nodes
			// see TreeService
			cli.TreeService.DeleteNodes(nil)
		}

	}
}

@zayn thanks that looks great! Of course, now I need to figure out how to use GO as I’ve never touched it before.

Was it your intention that this snippet should be used to create a separate app/script or is this something that could be turned into a plugin? I don’t see any resources in the documentation about developing plugins and I was kind of hoping to make this something that could be used from within Cells itself so other admins wouldn’t need to use command line scripts for this purpose.

Also, I see that you’ve used AdminTreeService to list all nodes, and TreeService to delete nodes. Can I do this with the REST API? Can an admin delete any node using TreeService as long as they have the path reference?

Thanks so much, and sorry to be asking so many questions lol

Hey @scott.bentley,

Yes, all of those operations are available through the REST API as well, sorry If I wrote the snippet in go as it was the main language that I use to write my scripts.

You could write the same script with bash, or other languages for instance java ( cells java sdk )
.

To add this as a plugin would be possible, there are no direct indication but you could analyze the code and see how it is done for the other plugins.

Hey @zayn,

I’ve been playing with Postman and the Cells API trying to make this work and I’ve come a long way but stumbled at the finish line. I have managed to use the a/tree/admin/list endpoint to list all the files under cells belonging to users, as you can see in this screenshot:

And in this second screenshot you can see that I am trying to delete one of these files from the “cellsdata” storage datasource, under the user “scott.bentley@hhangus.com”, however, the response says I cannot access the workspace. Now, I’m doing this as the “admin” account, so there should not be permissions issues involved. I’m wondering if the issue has to do with the fact that “cellsdata” is a storage “datasource” and not a “workspace”. Can you please let me know if I’m doing something wrong, or provide guidance?

Hello @scott.bentley,

cellsdata, is actually the name of the datasource that holds all the cells and the api to delete nodes (/tree/delete) only works with path that are not from the admin_view (meaning the admin list).

Unfortunately I just realized that there is no API to delete with the admin pathes.

Another solution that was given to me by the devs is to create a workspace with all of your datasources as roots and then use the usual api /a/tree/stat see /tree/stat on that workspace to list the nodes a perform the actions that need on them.

Sorry if I have misguided you with the adminTreeService API, it seems that this api is only used for specific cases on the application.

Thanks @zayn

I actually had been considering trying that as I noticed that the Workspaces configuration would allow me to make any disk location part of the root. I was reluctant to try it though as I wasn’t sure what effect it might have on permissions or other settings. If you think this is a good idea, I’ll try it out, thanks.

Getting back to the idea of a plugin, is there documentation of any sort that would help me understand how to write one, where to even start? I looked at the github code and as I have zero experience with whatever framework* you’re using and have never even used GO, I’m more than a little lost as to where to begin.

** what framework are you using anyway??

@zayn,

Ok, so I’ve made progress!

(1) I created a workspace called “Cells Data” with the slug “cellsdataws” with Read Only basic permissions.

(2) Assigned Read/Write permissions to the Administrator Role.

(3) I tried to list the workspace contents using a/tree/stat but this only lists the content of one node at a time, which would require me to write a recursive script to run through all the nodes to find what I want. Instead of this, I am using a/tree/admin/list to actually find contents of the DATASOURCE:cellsdata with the LEAF filter, and this is great because I can get a full list of all the files and their MTime from a single API call.

(4) I used a/tree/delete to delete the node, however, because the node must reside in a workspace I replaced the datasource name (cellsdata) with the workspace slug (cellsdataws) in the path and it works…sort of. So, it actually moved the node into the recycle bin, which is not quite what I wanted.

(5) I used a/tree/delete to delete the recycle_bin and that cleared it out too.

So, this works!

Thank you sooo much!