1. build

uDeviceX uses a wrapper u.make for building the source. This allows to load modules when needed.

1.1. verbosity

The verbosity can be controlled during the compilation process:

Shows full compilation and linking commands:

u.make LOG=

Profile compilation time:

u.make LOG='@time -f "%e $<"'

Hide compilation messages:

u.make LOG=@

1.2. Arch Linux

u.make MPI_VARIANT=ompi-cxx HDF5_CXXFLAGS= HDF5_LIBS=-lhdf5 NVCC_LIBS=-L/opt/cuda/lib64

or

MAKEFLAGS="MPI_VARIANT=ompi-cxx HDF5_CXXFLAGS= HDF5_LIBS=-lhdf5 NVCC_LIBS=-L/opt/cuda/lib64" \
				u.test test/compile

1.3. dependencies

Dependencies must be generated after adding a file or changing includes. Run from src/ directory:

../tools/deps

2. coding conventions

2.1. modules

uDeviceX consists of modules. Modules are as independant as possible. They consist of a folder with one or more compilation units, and an interface. Whenever it is possible, the structures are hidden from the client of the module.

2.2. naming

2.2.1. variables

  • names of local variables are short and understandable from the context

  • names of global variables are as descriptive as possible

  • arrays of simple structures x have names xx, eg. pp is an array of particles, cc is an array of colors, ii is an array of indices * array of small dimentionality may use an enum type for readability, e.g.

enum {X, Y, Z};
int r[3];
r[X] = rx;
r[Y] = ry;
r[Z] = rz;

2.2.2. functions

  • a function name is descriptive on its own or inside a module.

  • arguments are ordered as follow:

    • input is at the beginning

    • input/output come after input and start with a commented io

    • output comes after input/output and start with a comment or o depending on the context

    • sizes have higher priority than arrays

    • workspace comes at the end ans starts with a commented w

  • do not use references, use pointers instead

Example of valid function declarations:

void read_int_from_file(const char *fname, /**/ int *n, int *ii);
void forward_euler(int n, const Force *ff, /* io */ Particle *pp);

Function names part of a module interface start with the module name. Functions private to a module do not need to contain the module name if the context make it clear. Use the static qualifier for functions with compilation-unit scope.

2.3. file structure

  • no include guards; no "headers in headers" if easily avoidable

  • all cuda kernels are in a separate header .h file

  • a module is implemented inside its own directory A

    • it has its own object A/imp.cu or A/imp.cpp (.cpp prefered if possible)

    • interface A/imp.h

    • implementation can be done in separate files inside A/imp/ directory

    • internal cuda (private to module) code are inside A/dev/ directory

    • cuda interface inside A/dev.h (for client use)

  • modules can have submodules, which follow the same structure as above, e.g. submodule B inside module A belongs to A/B/ directory

2.4. style

for emacs: the following c++ mode is used

(defun u/c-indent-common ()
  (setq c-basic-offset 4
	indent-tabs-mode nil))

(defun u/c++-indent ()
  (u/c-indent-common)
  (c-set-offset 'innamespace [0]))

(defun u/c-indent () (u/c-indent-common))

(add-hook 'c-mode-hook   'u/c-indent)
(add-hook 'c++-mode-hook 'u/c++-indent)

(setq c-default-style
      '((java-mode . "java")
        (awk-mode  . "awk")
        (c-mode    . "k&r")
        (cc-mode   . "k&r")))

(add-to-list 'auto-mode-alist '("\\.cu\\'" . c++-mode))
(add-to-list 'auto-mode-alist '("\\.h\\'"  . c++-mode))

3. documentation

The uDeviceX documentation is written in Asciidoc format and converted to html. This operation requires asciidoctor (converter to html) and pygments.rb (for code highlighting).

3.1. installation

3.1.1. Mac OS X

gem install asciidoctor
gem install pygments.rb

3.1.2. Arch linux

sudo pacman -S asciidoctor
gem install pygments.rb

3.2. build

When writing documentation, it is useful to build it and view it on a local server.

From doc/:

./tools/start (1)
./tools/view  (2)
./tools/stop  (3)
1 start a local server (to be done only once)
2 open the main page in browser from local server
3 stop the local server
./configure (1)
make -j     (2)
1 make dependencies (to be run after adding a file to documentation)
2 build documentation for local server

3.3. Deploy

The deploy command cleans, builds and pushes the documentation on github pages https://github.com/amlucas/udoc. The commit message on the udoc repository points on the current uDeviceX commit. It is preferable to have a clean local repository before deploying:

git clean -xdf
./tools/deploy

4. algo: algorithm helpers

4.1. edg: mesh edge data

Set and get val on edges of the mesh.

void edg_ini(int md, int nv,                     int *hx);
void edg_set(int md, int i, int j, int val, /**/ int *hx,       int *hy); (1)
int  edg_get(int md, int i, int j,         const int *hx, const int *hy); (2)
int  edg_valid(int md, int i, int j,         const int *hx); (3)
1 set val
2 get val

md is a maximum degree, nv number of vertices, h[xy] storage of size nv*md.

4.2. Kahan summation

Summation stable to numerical error.

void kahan_sum_ini(/**/ KahanSum**);
void kahan_sum_fin(KahanSum*);

void kahan_sum_add(KahanSum*, double input); (1)
double kahan_sum_get(const KahanSum*); (2)
double kahan_sum_compensation(const KahanSum*); (3)
1 add input to the sum
2 return current sum
3 return current value of compensation for diagnostic

Example:

kahan_sum *k;
kahan_sum_ini(&k);
for (i = 0; i < n; i ++)
    kahan_sum_add(kahan_sum, a[i]);
sum = kahan_sum_get(kahan_sum);
kahan_sum_fin(k);

4.3. minmax

compute extends of chunks of particles

  • input is an array of particles, the number of chunks and the number of particles per chunk.

  • output is the bounding box in for each chunk of particles.

4.3.1. example

Consider the simple case on 1D:

input:
	particles positions:
	0.1 0.5 0.2   0.3 1.2 1.0
	number of chunks: 2
	number of particles per chunk: 3

output:
	min:
	0.1 0.3
	max:
	0.5 1.0

4.3.2. interface

void minmax(const Particle *pp, int nv, int nobj, /**/ float3 *lo, float3 *hi);

4.4. scan: prefix sum

exclusive prefix sum implementation for integer data

input is of size n, output is of size n + 1 as in the following example:

input:  4 5 1  2  3  -
output: 0 4 9 10 12 15

4.4.1. interface

void scan_apply(const int *input, int size, /**/ int *output, /*w*/ Scan*); (1)
void scan_ini(int size, /**/ Scan**);                                       (2)
void scan_fin(Scan*);                                                       (3)
1 perform scan operation on input of size size
2 allocate workspace for performing scan of size size
3 free workspace

4.5. Scalars

A read-only wrapper to provide uniform accsses to an array n scalars.

void scalars_float_ini(int n, const float*, /**/ Scalars**);
void scalars_double_ini(int n, const double*, /**/ Scalars**);
void scalars_vectors_ini(int n, const Vectors*, int dim, /**/ Scalars**);

void scalars_zero_ini(int n, /**/ Scalars**);
void scalars_one_ini(int n, /**/ Scalars**);

void scalars_fin(Scalars*);
double scalars_get(const Scalars*, int i);

See also Vectors.

4.6. Vectors

A read-only wrapper to provide uniform accsses to an array n vectors of size three. The module does not copy data.

void vectors_float_ini(int n, const float*, /**/ Vectors**); (1)
void vectors_postions_ini(int n, const Particle*, /**/ Vectors**); (2)
void vectors_postions_edge_ini  (const Coords*, int n, const Particle*, /**/ Vectors**); (3)
void vectors_postions_center_ini(const Coords*, int n, const Particle*, /**/ Vectors**); (4)
void vectors_velocities_ini(int n, const Particle*, /**/ Vectors**); (5)
void vectors_zero_ini(int n, /**/ Vectors**); (6)

void vectors_fin(Vectors*);
void vectors_get(const Vectors*, int i, /**/ float r[3]); (7)
1 array of float packed as x0 y0 z0 x1 y1 z1
2 particle positions
3 particle positions relative to domain edge
4 particle positions relative to domain center
5 particle velocity
6 array of zeros
7 return a vector i

Example,

#define n 2
Vectors  *pos;
float r[3], data[3*n] = {1, 2, 3,   4, 5, 6};
vectors_float_ini(n, data, &pos);
vectors_get(pos, 1, /**/ r);
vectors_fin(pos);

See also Scalars.

4.7. utils

helper device functions

Warp reduction: sum all elements within a warp. This is a collective operation, all threads of the warp must execute it.

template <typename T>
_I_ T warpReduceSum(T v)  (1)
_I_ float2 warpReduceSum(float2 val)  (2)
_I_ float3 warpReduceSum(float3 val)  (3)
1 generic warp reduction. thread 0 of the warp gets the result.
2 specific implementation for float2
3 specific implementation for float3

5. clist

Helpers to find which particles are in a given cell.

  • reorder data according to particle positions into cells

  • access particles within a given cell

5.1. example

consider a 1D example of particles with positions in domain [0, 4)

pp: 2.5 0.1 1.1 0.5 1.6

After cell list building (we take 1 as cell size), the above data becomes

pp: 0.1 0.5 1.1 1.6 2.5  -
cc: 2       2       1    0
ss: 0       2       4    5

where pp is the particle array, cc (counts) is the number of particles per cell and ss (starts) is the prefix sum of cc.

Accessing all particles in cell i=1 ([1,2)) can be done as follows:

loop: j = 0, ..., cc[i]
      id = ss[i] + j
      p = pp[id]
      work(p)

The above can be easily extended in 3D, only ordering changes.

5.2. data structures

5.2.1. Clist

struct Clist {
    int3 dims;
    int ncells;
    int *starts, *counts;
};
  • dims dimensions of the grid

  • ncells number of cells

  • counts number of particles per cell

  • starts exclusive prefic scan of the above

5.2.2. Map

Helper to build the cell lists (hidden from client)

struct ClistMap {
    int nA;       /* number of source arrays to build the cell lists, e.g remote+bulk -> 2 */
    uchar4 **ee;  /* cell entries */
    uint *ii;     /* codes containing: indices of data to fetch and array id from which to fetch */
    Scan *scan;   /* scan workspace */
    long maxp;    /* maximum number of particles per input vector */
};
  • nA number of source arrays

  • ee cell entries: one array per source array, containing a list of uchar4 with entries (xcid, ycid, zcid, sid)

    • xcid: x cordinate of the cell, in 0, …​, dims.x

    • ycid: y cordinate of the cell, in 0, …​, dims.y

    • zcid: z cordinate of the cell, in 0, …​, dims.z

    • sid: source array id, in 0, …​, nA

  • ii indices of particles to fetch. can be decoded into the source array id and the particle id inside this array

  • scan scan workspace, see scan

5.3. interface

allocate/deallocate

void clist_ini(int LX, int LY, int LZ, /**/ Clist **c);
void clist_fin(/**/ Clist *c);

void clist_ini_map(int maxp, int nA, const Clist *c, /**/ ClistMap **m);
void clist_fin_map(ClistMap *m);

build interface:

void clist_ini_counts(Clist *c);
void clist_subindex(bool project, int aid, int n, const PartList lp, /**/ Clist *c, ClistMap *m);
void clist_build_map(const int nn[], /**/ Clist *c, ClistMap *m);

/* special for fluid distribution */
void clist_subindex_local(int n, const PartList lp, /**/ Clist *c, ClistMap *m);
void clist_subindex_remote(int n, const PartList lp, /**/ Clist *c, ClistMap *m);

void clist_gather_pp(const Particle *pplo, const Particle *ppre, const ClistMap *m, long nout, /**/ Particle *ppout);
void clist_gather_ii(const int *iilo, const int *iire, const ClistMap *m, long nout, /**/ int *iiout);

/* quick cell build for single array */
void clist_build(int nlo, int nout, const Particle *pplo, /**/ Particle *ppout, Clist *c, ClistMap *m);

The fields can be accessed from

const int* clists_get_ss(const Clist *c);
const int* clists_get_cc(const Clist *c);
int3       clists_get_dim(const Clist *c);
int        clists_get_n(const Clist *c);
const uint* clist_get_ids(const ClistMap *m);

Device interface: code/decode map ids

_I_ uint clist_encode_id(int s, int pid)
_I_ void clist_decode_id(uint c, /**/ int *s, int *pid)

5.4. algorithm

The cell build process is linear in number of particles np and n log(n) in number of cells n. The process to build cell lists is as follows:

- build ee and counts: O(np)
  - set counts[i] = 0 for all i
  - for each particle, compute its cell id cid
  - get unique subindex k within the cell by updating cc[cid]
- scan counts to get starts O(nlog n)
- construct ii: O(np)
  - for each particle with id i, compute new index j = ss[cid] + k[i]
  - set ii[j] = i
- Gather data: O(np)
  - new particle vector pp1 from pp: pp1[i] = pp[ii[i]]

6. cnt: contact forces

contact interactions between objects (rigid objects, rbc membranes)

6.1. interface

allocate/deallocate

void cnt_ini(int maxp, int rank, int3 L, int nobj, /**/ Contact **c);
void cnt_fin(Contact *c);

computing forces:

void cnt_build_cells(int nw, const PairParams **prms, const PaWrap *pw, /**/ Contact *c); (1)
void cnt_bulk(const Contact *cnt, int nw, const PairParams **prms, PaWrap *pw, FoWrap *fw); (2)
void cnt_halo(const Contact *cnt, int nw, const PairParams **prms, PaWrap *pw, FoWrap *fw, Pap26 all_pp, Fop26 all_ff, const int *all_counts); (3)
1 build cell lists for local objects
2 compute the interactions between local objects particles
3 compute the interactions between local and remote objects particles

7. color

7.1. purpose

Each particle of the solvent has an assigned color when the multi_solvent flag is enabled. It is helpful for modeling different solvent with different viscosities, surface tensions etc.

7.2. modules

Some utility modules are implemeted in /src/color.

7.2.1. flux

recolor particles crossing the face of the periodic domain in the positive direction x, y or z

struct int3;
struct Particle;
struct Coords;

struct Rnd;

void color_linear_flux(const Coords*, int3 L, int dir, int color, int n, const Particle *pp, /**/int *cc);

void color_tracers(const Coords *coords, int color, const float R, const float Po, int n, const Particle *pp, int *cc);
void decolor_tracers(const Coords *coords, int color, const float R, const float Po, int n, const Particle *pp, int *cc);

void tracers_ini(const int n);

void tracers_fin();
  • dir is the direction x, y or z

  • color is the new color

  • pp are input particles of size n

  • cc are output colors (stay the same if does not cross the face)

8. comm: generic communicator

generic communicator with neighboring ranks (halo).

8.1. purpose

Communicate data with the neighboring ranks

8.2. data structures

hBags: buffers on the host, contains all necessary information of the data to communicate:

struct hBags {
    data_t *data[NBAGS]; /* data on the host                    */
    int         *counts; /* size of the data                    */
    int capacity[NBAGS]; /* capacity of each frag (elem number) */
    size_t bsize;        /* size of one datum in bytes          */
};

dBags: buffers on the device

struct dBags {
    data_t *data[NBAGS]; /* data on the device         */
};

CommBuffer: wrapper to serialize hBags data. Hidden from user.

Comm: contains the communication related variables. It is hidden from user.

8.3. allocation mod

User can choose how the buffers are allocated with the enum type AllocMod. This is used in the function ini and fin. allocation mode and free mode are assumed to be the same

currently supported allocation modes:

    HST_ONLY,   /* only host bags allocated                 */
    DEV_ONLY,   /* only device bags allocated               */
    PINNED,     /* both host and device pinned              */
    PINNED_HST, /* host pinned; no device memory            */
    PINNED_DEV, /* host pinned; device global memory on gpu */
    NONE        /* no allocation                            */

8.4. interface

8.4.1. memory management

void comm_bags_ini(AllocMod fmod, AllocMod bmod, size_t bsize, const int capacity[NBAGS], /**/ hBags *hb, dBags *db);
void comm_bags_fin(AllocMod fmod, AllocMod bmod, /**/ hBags *hb, dBags *db);
void comm_ini(MPI_Comm cart, /**/ Comm **c);
void comm_fin(/**/ Comm *c);
void comm_buffer_ini(int nbags, const hBags *hbb, CommBuffer**);
void comm_buffer_fin(CommBuffer*);
  • bags alloc: Given two structures hBags and dBags, ini allocates the buffers on host and device. ini expects 2 allocation modes:

    • fmod: allocation mode for fragment buffers

    • bmod: allocation mode for bulk buffer

  • comm alloc: initialize Comm structure

  • comm_buffer: initialize from bags capacities.

8.4.2. communication

Communication happens between host bags (see hBags structure). It needs 3 entities:

  • receiver bags

  • sender bags

  • a Comm for communicating through MPI

The interface is as follows:

void comm_post_recv(hBags *b, Comm *c);           (1)
void comm_post_send(const hBags *b, Comm *c);     (2)

void comm_wait_recv(Comm *c, /**/ hBags *b);      (3)
void comm_wait_send(Comm *c);                     (4)
1 call MPI asynchronous recv and store requests in s
2 call MPI asynchronous send and store requests in s
3 wait for recv requests
4 wait for send requests

For more than one bags, it is recommanded to use the CommBuffer structure to group the messages into one:

void comm_post_recv(CommBuffer *cb, Comm *c);
void comm_post_send(const CommBuffer *cb, Comm *c);

void comm_wait_recv(Comm *c, /**/ CommBuffer *cb);
void comm_wait_send(Comm *c);

8.4.3. getters

Convenient get functions:

int    comm_get_number_capacity(int i, const hBags *b); (1)
size_t comm_get_byte_capacity(int i, const hBags *b);   (2)
1 get the number of elements (not bytes) that can be stored in the fragment i of the bag
2 get the number of bytes that can be stored in the fragment i of the bag

Convert data from bag to static array:

template <typename BAGT, typename T, int N>
static void bag2Sarray(BAGT bags, Sarray<T*, N> *buf)

9. conf: runtime configuration

9.1. purpose

  • wrapper for the libconfig library

  • holds configurations from 4 sources:

    • set by program (priority 4)

    • arguments (priority 3)

    • configuration file (priority 2)

    • default parameters (priority 1)

When retrieving a value, the source with higher priority is used. If not found in a source, the value is retrieved from the next highest priority. If no source contains the parmeter, an error is raised.

9.2. data structure

enum {
    EXE, /* from program setters */
    ARG, /* from arguments       */
    OPT, /* from additional file */
    DEF, /* from default file    */
    NCFG
};

struct Config {
    config_t c[NCFG];
    config_t *r; /* what have been read */
};

Consists of an array of config_t descriptors, ordered by priority. Indices correspond to the enum keys. The first key has the highest priority, second key has second highest priority etc.

The r field is the configuration history, i.e what is read by the program.

9.3. interface

Allocate and free the structure:

void conf_ini(/**/ Config **c);
void conf_fin(/**/ Config *c);

Read data from the default, optional and command line config:

void conf_read(int argc, char **argv, /**/ Config *cfg);

Lookup variables from the configuration:

void conf_lookup_int(const Config *c, const char *desc, int *a);
void conf_lookup_float(const Config *c, const char *desc, float *a);
void conf_lookup_bool(const Config *c, const char *desc, int *a);
void conf_lookup_string(const Config *c, const char *desc, const char **a);
void conf_lookup_vint(const Config *c, const char *desc, int maxn, int *n, int a[]);
void conf_lookup_int3(const Config *c, const char *desc, int3 *a);
void conf_lookup_vfloat(const Config *c, const char *desc, int maxn, int *n, float a[]);
void conf_lookup_float3(const Config *c, const char *desc, float3 *a);
void conf_lookup_vstring(const Config *c, const char *desc, int maxn, int *n, const char **ss);

Note that the program raises an error if the argument is not found or if an error occured (e.g. wrong type).

A convenient set of functions to look for variable in a given namespace:

void conf_lookup_int_ns(const Config *c, const char *ns, const char *desc, int *a);
void conf_lookup_float_ns(const Config *c, const char *ns, const char *desc, float *a);
void conf_lookup_bool_ns(const Config *c, const char *ns, const char *desc, int *a);
void conf_lookup_string_ns(const Config *c, const char *ns, const char *desc, const char **a);
void conf_lookup_vint_ns(const Config *c, const char *ns, const char *desc, int maxn, int *n, int a[]);
void conf_lookup_int3_ns(const Config *c, const char *ns, const char *desc, int3 *a);
void conf_lookup_vfloat_ns(const Config *c, const char *ns, const char *desc, int maxn, int *n, float a[]);
void conf_lookup_float3_ns(const Config *c, const char *ns, const char *desc, float3 *a);
void conf_lookup_vstring_ns(const Config *c, const char *ns, const char *d, int maxn, int *n, const char **ss);

Lookup variables from the configuration:

bool conf_opt_int(const Config *c, const char *desc, int *a);
bool conf_opt_float(const Config *c, const char *desc, float *a);
bool conf_opt_bool(const Config *c, const char *desc, int *a);
bool conf_opt_string(const Config *c, const char *desc, const char **a);
bool conf_opt_vint(const Config *c, const char *desc, int maxn, int *n, int a[]);
bool conf_opt_int3(const Config *c, const char *desc, int3 *a);
bool conf_opt_vfloat(const Config *c, const char *desc, int maxn, int *n, float a[]);
bool conf_opt_float3(const Config *c, const char *desc, float3 *a);
bool conf_opt_vstring(const Config *c, const char *desc, int maxn, int *n, const char **ss);

The program does not raise error if the variable is not found. The return value is true if the variable has been found and false otherwise.

The client may overwrite variables in the config object:

void conf_set_int(const char *desc, int a, Config *c);
void conf_set_vint(const char *desc, int nelem, const int a[], Config *cfg);
void conf_set_int3(const char *desc, int3 a, Config *cfg);
void conf_set_float(const char *desc, float a, Config *cfg);
void conf_set_vfloat(const char *desc, int nelem, const float a[], Config *cfg);
void conf_set_float3(const char *desc, float3 a, Config *cfg);
void conf_set_bool(const char *desc, int a, Config *cfg);
void conf_set_string(const char *desc, const char *a, Config *cfg);

desc contains the names of the nested groups. separated by a . or a /.

Example:

const char *desc = "main.sub.a";

will correspond to

main = {
    sub = {
        a = ...
    }
}

The configuration can be dumped by using

void conf_write_exe(const Config *cfg, FILE *stream);     (1)
void conf_write_history(const Config *cfg, FILE *stream); (2)
1 dump the fields set by the program to a stream
2 dump the fields read by the program to a stream

10. Control

  • velocity controller

  • open boundary conditions

  • density controller

10.1. Velocity controller

control velocity by modifying the imposed hydrostatic force the force is modified using a PID controller https://en.wikipedia.org/wiki/PID_controller.

10.1.1. Controlled variables

The controlled variable is the (global) body force value. The state of the system is represented by

\[\mathbf{U} = \frac 1 n \sum\limits_{i=1}^n \mathbf{f} \left( \mathbf{r_i}, \mathbf{v_i} \right)\]

where \(( \mathbf{r_i}, \mathbf{v_i})\) are the \(i^{th}\) particle position and velocity, \(n\) is the number of particles and \(\mathbf{f}\) is a given transformation.

There is therefore 3 (independant) controlled variables.

the transformation is the way of averaging the velocity:

cartesian transformation keeps the velocity and does not depends on position:

\[\mathbf{f} \left( \mathbf{r}, \mathbf{v} \right) = \mathbf{v}\]

radial transformation returns the velocity in polar coordinates scaled by the radial position. the z component corresponds to cartesian transformation:

\[\mathbf{f} \left( \mathbf{r}, \mathbf{v} \right) = \left( r [v_x \cos(\theta) + v_y \sin(\theta)], r[- v_x \sin(\theta) + v_y \cos(\theta)], v_z \right),\]

with \(r = \sqrt{x^2 + y^2}\) and \(\theta = \arctan(y/x)\). \(x\) and \(y\) are particle position with respect to the center of the domain.

10.1.2. interface

allocate, deallocate:

void vcont_ini(MPI_Comm comm, int3 L, /**/ PidVCont **c);
void vcont_fin(/**/ PidVCont *cont);

set the parameters (to be done only once, before any other calls):

void vcont_set_params(float factor, float Kp, float Ki, float Kd, /**/ PidVCont *c); (1)
void vcont_set_target(float3 vtarget, /**/ PidVCont *c);  (2)
void vcont_set_cart(/**/ PidVCont *c);  (3)
void vcont_set_radial(/**/ PidVCont *c);  (4)
1 set the PID control variables. factor scales variables Kp, Ki` and Kd
2 set the target velocity (in transformed space)
3 set the transformation mode to cartesian
4 set the transformation mode to radial

The parameters can also be set through the configuration file:

void vcont_set_conf(const Config *cfg, /**/ PidVCont *c);

By default, the state of the controller is set to zero current velocity. It can also be set throught the restart mechanism:

void vcont_strt_dump(MPI_Comm, const char *base, int id, const PidVCont*); (1)
void vcont_strt_read(          const char *base, int id, PidVCont*);       (2)
1 dump the state of the controller to a text file
2 read the state from a text file

Control operations:

void   vcont_sample(const Coords *coords, int n, const Particle *pp, const int *starts, const int *counts, /**/ PidVCont *c); (1)
float3 vcont_adjustF(/**/ PidVCont *c); (2)
void   vcont_log(const PidVCont *c);    (3)
1 get average velocity according to transformation and store it
2 reduce sampled quantities and update the controller. return the force
3 logging informations; dumped into the file vcon.txt

10.1.3. configuration

syntax:

vcon = {
     active = true
     type   = "cart"          # can be also "rad"
     U      = [1.0, 1.0, 0.0] # target velocity in transformed space
     factor = 0.08
     Kp = 2.0
     Ki = 1.0
     Kd = 8.0
     log_freq    = 500
     adjust_freq = 500
     sample_freq = 1
}

10.2. Inflow

Mass rate inflow.

Create particles given a surface \(S(u, v)\), with a velocity profile \(\mathbf{U}(u,v)\) with \((u,v) \in (0,1) \times (0,1)\).

10.2.1. Method

Define "cumulative inflow" \(\phi(u, v)\) in a bin of area \(dA \in S\).

Updated as follow:

\[\phi^{n+1} = \phi^n + dt \rho dA \mathbf{u} \cdot \mathbf{n}\]

where \(\rho\) is the number density.

When \(\phi \geq 1\), a particle is released at the bin position with random velocity \(\mathcal{N} (\mathbf{u}, k_BT)\). We also set \(\phi \leftarrow \phi - 1\).

10.2.2. interface

allocate, deallocate:

void inflow_ini(int2 nc, Inflow **i);
void inflow_fin(Inflow *i);

initialize the inflow structure (to be done only once):

void inflow_ini_velocity(Inflow *i);

void inflow_ini_params_plate(const Coords *c, float3 o, int dir, float L1, float L2,
                             float3 u, bool upoiseuille, bool vpoiseuille,
                             /**/ Inflow *i);
void inflow_ini_params_circle(const Coords *c, float3 o, float R, float H, float u, bool poiseuille,
                              /**/ Inflow *i);

depending on the type of inflow, call the appropriate ini_params_xxx. ini_velocity must be called after initializing the parameters. it is used to setup the inlet velocity from parameters.

initialize from config parameters:

void inflow_ini_params_conf(const Coords *coords, const Config *cfg, /**/ Inflow *i);

create particles (at every time step):

void inflow_create_pp(float kBT, int numdensity, float dt, Inflow *i, int *n, Particle *pp); (1)
void inflow_create_pp_cc(float kBT, int numdensity, float dt, int newcolor, Inflow *i, int *n, Particle *pp, int *cc); (2)
1 add particles to the array pp and increase n accordingly
2 same as above, with colors cc set to color

10.2.3. configuration

plate

syntax for a 8.0 x 16.0 plate perpendicular to the x axis passing by the point (1, 2, 3):

inflow = {
    active      = true
    type        = "plate"
    L1          = 1.0
    L2          = 16.0
    direction   = 0                # [0,1,2] = [X,Y,Z]
    origin      = [1.0, 2.0, 3.0]
    upoiseuille = false            # true for parabolic profile along L1
    vpoiseuille = false            # true for parabolic profile along L2
    u           = [10.0, 0.0, 0.0]
}
circle

syntax for cylindrical surface:

inflow = {
    active = true
    type   = "circle"
    R      = 1.0              # radius
    H      = 16.0             # hight of the cylinder
    U      = 1.0              # maximum velocity
    center = [8.0, 8.0, 8.0]
    poiseuille = false        # parabolic profile along H if true
}

10.3. Outflow

Delete particles satisfying a predicate

10.3.1. interface

allocate, deallocate:

void outflow_ini(int maxp, /**/ Outflow **o);
void outflow_fin(/**/ Outflow *o);

set the outflow geometry type (to be done only once):

void outflow_set_params_circle(const Coords*, float3 c, float R, Outflow *o);
void outflow_set_params_plate(const Coords*, int dir, float r0, Outflow *o);

this can be done via configuration file:

void outflow_set_conf(const Config *cfg, const Coords*, Outflow *);

filter and mark particles (to be called at every time step):

void outflow_filter_particles(int n, const Particle *pp, /**/ Outflow *o); (1)

void outflow_download_ndead(Outflow *o);                                   (2)

int* outflow_get_deathlist(Outflow *o);                                    (3)
int  outflow_get_ndead(Outflow *o);                                        (4)
1 store dying particles infos inside Outflow struct
2 copy from device to host number of dead particles
3 return the list of dead marks (dead=1, alive=0)
4 return the number of dead particles

10.3.2. configuration

plate

will delete particles crossing a plate along the positive direction. syntax for the plate y=4:

outflow = {
    active     = true
    type       = "plate"
    direction  = 1      # [0,1,2] = [X,Y,Z]
    position   = 4.0    # position along "direction"
}
circle

delete particles going out of the cylinder. syntax for a cylinder of radius 8 centered at (8, 8, 8):

outflow = {
    active = true
    type   = "circle"
    R      = 8.0
    center = [8.0, 8.0, 8.0]
}

10.4. den: Density controller

Delete particles in high density regions

10.4.1. interface

allocate, deallocate:

void den_ini(int maxp, /**/ DCont**);
void den_map_ini(DContMap**);

void den_fin(DCont*);
void den_map_fin(DContMap*);

set the geometry type (to be done only once):

void den_map_set_none(const Coords*, DContMap*);
void den_map_set_circle(const Coords*, float R, DContMap*);

this can be done via configuration file:

void den_map_set_conf(const Config*, const Coords*, DContMap*);

filter and mark particles (to be called at every time step):

void outflow_filter_particles(int n, const Particle *pp, /**/ Outflow *o); (1)

void outflow_download_ndead(Outflow *o);                                   (2)

int* outflow_get_deathlist(Outflow *o);                                    (3)
int  outflow_get_ndead(Outflow *o);                                        (4)
1 reset the structure
2 internally mark particles in the regions with density higher than macdensity
3 internally download the number of dead particles

The above functions only compute and store the results internally. The computed quantities can be accessed via:

void outflow_filter_particles(int n, const Particle *pp, /**/ Outflow *o); (1)

void outflow_download_ndead(Outflow *o);                                   (2)

int* outflow_get_deathlist(Outflow *o);                                    (3)
int  outflow_get_ndead(Outflow *o);                                        (4)
1 return a pointer to internal buffer with the "death list"
2 return the number of particles killed by the filter operation

10.4.2. configuration

none

disabled density control everywhere

denoutflow = {
    active = true
    type = "none"
}
circle

delete particles going out of a cylinder centered at the center of the domain. syntax for a cylinder of radius 8:

denoutflow = {
    active = true
    type   = "circle"
    R      = 8.0
}

11. coords

Coordinate transforms utility.

In uDeviceX, the simulation domain is decomposed into a grid of subdomains of equal sizes. Computations are performed in local coordinates and all objects are relative to these coordinates. Operations, such as initialisation of the data, io, etc., need global coordinates informations.

For convenience, we consider the three coordinate systems:

  • Global: Relative to the lower corner of the full domain of size \((G_x, G_y, G_z)\)

  • Local: Relative to the center of the subdomain of size \((L_x, L_y, L_z)\)

  • Center: Relative to the center of the full domain: \((C_x, C_y, C_z) = (G_x/2, G_y/2, G_z/2)\)

11.1. data structures

All the needed information is stored in the hidden structure Coords. A view structure Coords_v is public to device code:

struct Coords_v {
    int xc, yc, zc; /* rank coordinates        */
    int xd, yd, zd; /* rank sizes              */
    int Lx, Ly, Lz; /* [L]ocal: subdomain size */
};

11.2. Host interface

11.2.1. Memory management

void coords_ini(MPI_Comm cart, int Lx, int Ly, int Lz, Coords **c); (1)
void coords_fin(Coords *c); (2)
1 allocate and set the data according to the cartesian communicator
2 deallocate the structure

The resources can also be allocated using configuration informations:

void coords_ini_conf(MPI_Comm cart, const Config *cfg, Coords **c);

11.2.2. Getters

Obtain a view structure from Coords:

void coords_get_view(const Coords *c, Coords_v *v);

Get \((G_x, G_y, G_z)\):

int xdomain(const Coords *c);
int ydomain(const Coords *c);
int zdomain(const Coords *c);

Get \((L_x, L_y, L_z)\):

int xs(const Coords*);
int ys(const Coords*);
int zs(const Coords*);
int3 subdomain(const Coords*);

Get (from local coordinates) \((-L_x/2, -L_y/2, -L_z/2)\) and \((L_x/2, L_y/2, L_z/2)\) in global coordinates:

int xlo(const Coords*);
int ylo(const Coords*);
int zlo(const Coords*);

int xhi(const Coords*);
int yhi(const Coords*);
int zhi(const Coords*);

11.2.3. Coordinate transforms

From local coordinates to center coordinates:

float xl2xc(const Coords *c, float xl);
float yl2yc(const Coords *c, float yl);
float zl2zc(const Coords *c, float zl);
void local2center(const Coords *c, float3 rl, /**/ float3 *rc);

From center coordinates to local coordinates:

float xc2xl(const Coords *c, float xc);
float yc2yl(const Coords *c, float yc);
float zc2zl(const Coords *c, float zc);
void center2local(const Coords *c, float3 rc, /**/ float3 *rl);

From local coordinates to global coordinates:

float xl2xg(const Coords *c, float xl);
float yl2yg(const Coords *c, float yl);
float zl2zg(const Coords *c, float zl);
void local2global(const Coords *c, float3 rl, /**/ float3 *rg);

From global coordinates to local coordinates:

float xg2xl(const Coords *c, float xg);
float yg2yl(const Coords *c, float yg);
float zg2zl(const Coords *c, float zg);
void global2local(const Coords *c, float3 rg, /**/ float3 *rl);

11.2.4. helpers

/* rank predicates */
bool is_end(const Coords *c, int dir);

/* a string unique for a rank */
void coord_stamp(const Coords *c, /**/ char *s);

/* number of subdomains */
int coords_size(const Coords *c);

11.3. Configuration

Set subdomain sizes:

glb = {
    L = [16, 12, 24]
}

12. generic [d]evice API

The goal is to make cuda API and kernel functions generic. Meaning they can be replaced by CPU calls.

There are two groups: functions which "mirror" cuda API:

int GetSymbolAddress(void **devPtr, const void *symbol);
int Malloc(void **p, size_t);
int MemcpyToSymbol(const void *symbol, const void *src, size_t count, size_t offset=0, int kind=MemcpyHostToDevice);
int MemcpyFromSymbol(void *dst, const void *symbol, size_t count, size_t offset=0, int kind=MemcpyDeviceToHost);
int HostGetDevicePointer(void **pDevice, void *pHost, unsigned int flags);
int Memcpy (void *dst, const void *src, size_t count, int kind);
int MemsetAsync (void *devPtr, int value, size_t count, Stream_t stream=0);
int Memset (void *devPtr, int value, size_t count);
int MemcpyAsync (void * dst, const void * src, size_t count, int kind, Stream_t stream = 0);
int Free (void *devPtr);
int FreeHost (void *hstPtr);
int DeviceSynchronize (void);
int PeekAtLastError(void);

functions unique for uDeviceX:

const char *emsg();
int alloc_pinned(void **pHost, size_t size);
int is_device_pointer(const void *ptr);

files of the interface - d/api.h API calls - d/q.h function and variable type qualifiers - d/ker.h

if DEV_CUDA is defined interfaces is implimented by cuda, if DEV_CPU is defined it is implimented by host calls.

13. debug

Helper to check the particle positions, velocities, forces and cell-lists from device arrays

The hidden structure Dbg holds the debugging configuration. Configuration is encoded using the following enum values:

enum {
    DBG_POS,      (1)
    DBG_POS_SOFT, (2)
    DBG_VEL,      (3)
    DBG_FORCES,   (4)
    DBG_COLORS,   (5)
    DBG_CLIST,    (6)
    DBG_NKIND_
};
1 positions are inside subdomain
2 positions are inside subdomain plus a margin
3 velocities
4 forces
5 colors
6 cell lists

Consider subdomain of dimensions \((L_x, Ly, Lz)\). A position \((r_x, r_y, r_z)\) is valid if

\[-\frac{L_\alpha - M} {2} \leq r_\alpha < \frac{L_\alpha + M} 2,\]

where \(M\) is the margin. It is \(M = 0\) for DBG_POS and \(M = 3\) for DBG_POS_SOFT.

For a given timestep \(dt\), the velocity \((v_x, v_y, v_z)\) is valid if

\[2 |v_\alpha| dt \leq L_\alpha.\]

The force \((f_x, f_y, f_z)\) is valid if

\[2 |f_\alpha| dt^2 \leq L_\alpha.\]

13.1. interface

allocate/deallocate the structure:

void dbg_ini(Dbg**);
void dbg_fin(Dbg*);

set debugging modes and verbosity:

void dbg_enable(int kind, Dbg *dbg);
void dbg_disable(int kind, Dbg *dbg);
void dbg_set_verbose(bool, Dbg *dbg);
void dbg_set_dump(bool, Dbg *dbg);

initialize from configuration:

void dbg_set_conf(const Config*, Dbg*);

interface:

void dbg_check_pos(const Coords *c, const char *base, const Dbg *dbg, int n, const Particle *pp);
void dbg_check_pos_soft(const Coords *c, const char *base, const Dbg *dbg, int n, const Particle *pp);
void dbg_check_vel(float dt, const Coords *c, const char *base, const Dbg *dbg, int n, const Particle *pp);
void dbg_check_forces(float dt, const Coords *c, const char *base, const Dbg *dbg, int n, const Particle *pp, const Force *ff);
void dbg_check_colors(const Coords *c, const Dbg *dbg, int n, const int *cc);
void dbg_check_clist(const Coords *c, const Dbg *dbg, int3 L, const int *starts, const int *counts, const Particle *pp);

13.2. configuration

dbg = {
    verbose  = true;
    pos      = false;
    pos_soft = false;
    vel      = false;
    forces   = false;
    colors   = false;
    clist    = false;
};

14. distr

Redistribution of quantities across nodes. Each node has a local coordinate system. The origin is the center of a "subdomain" of size Sx, Sy, Sz.

A quantity is sent to a neighboring rank if its position is outside of the subdomain.

The workflow is very similar for every quantity to redistribute:

  • build a map from the positions of the quatities and store it in a Map structure.

  • pack the data using Map in a Pack structure

  • communicaion: exchange packed data with neighbors, receive data into Unpack structure. This is done using the generic communicator.

  • Unpack the data from Unpack to quantities.

14.1. map

Helper for packing data to send buffers. This is common to all quantities.

14.1.1. data structure

The structure to map local data from an array to packed data in 27 arrays is implemented as:

/* [D]istr Map */
struct DMap {
    int *counts;  /* number of entities leaving in each fragment */
    int *starts;  /* cumulative sum of the above                 */
    int *ids[27]; /* indices of leaving objects                  */

    int *hcounts; /* counts on host (pinned) */
};

This can be allocated on device or on host memory (see interface).

14.1.2. interface

Host interface:

void dmap_ini(int nfrags, const int capacity[], /**/ DMap *m);
void dmap_fin(int nfrags, /**/ DMap *m);
void dmap_reini(int nfrags, /**/ DMap m);
void dmap_download_counts(int nfrags, /**/ DMap *m);
void dmap_ini_host(int nfrags, const int capacity[], /**/ DMap *m);
void dmap_fin_host(int nfrags, /**/ DMap *m);
void dmap_reini_host(int nfrags, /**/ DMap m);
void dmap_D2H(int nfrags, const DMap *d, /**/ DMap *h);

Device interface:

/* exclusive scan */
template <int NCOUNTS>
static __global__ void dmap_scan(/**/ DMap m)              (1)
static __device__ int dmap_get_fid(int3 L, const float r[3])       (2)
static __device__ void dmap_add(int pid, int fid, DMap m)  (3)
1 prefix sum on counts to obtain starts
2 get fragment id from position
3 add a quantity to the map

14.2. flu

Redistribute solvent particles accross nodes.

14.2.1. data structures

Solvent distribution follow the common workflow described above and contains the three hidden generic data structures:

struct DFluPack {
    DMap map;
    dBags dbags[MAX_NBAGS], *dpp, *dii, *dcc;
    hBags hbags[MAX_NBAGS], *hpp, *hii, *hcc;
    int nbags;
    int nhalo; /* number of sent particles */
    int3 L; /* subdomain size */
    CommBuffer *hbuf;
};

struct DFluComm {
    Comm *comm;
};

struct DFluUnpack {
    hBags hbags[MAX_NBAGS], *hpp, *hii, *hcc;
    int nbags;
    Particle *ppre;
    int *iire, *ccre;
    int nhalo; /* number of received particles */
    int3 L; /* subdomain size */
    CommBuffer *hbuf;
};

where [d,h]xx means quantities on [device,host] and pp corresponds to particles, cc to colors and ii to global ids. The xxre variables in Unpack are buffer for remote quantities.

14.2.2. interface

void dflu_pack_ini(bool colors, bool ids, int3 L, int maxdensity, DFluPack **p);
void dflu_comm_ini(MPI_Comm comm, /**/ DFluComm **c);
void dflu_unpack_ini(bool colors, bool ids, int3 L, int maxdensity, DFluUnpack **u);

void dflu_pack_fin(DFluPack *p);
void dflu_comm_fin(DFluComm *c);
void dflu_unpack_fin(DFluUnpack *u);

/* map */
void dflu_build_map(int n, const PartList lp, DFluPack *p);

/* pack */
void dflu_pack(const FluQuants *q, /**/ DFluPack *p);
void dflu_download(DFluPack *p, /**/ DFluStatus *s);

/* communication */
void dflu_post_recv(DFluComm *c, DFluUnpack *u);
void dflu_post_send(DFluPack *p, DFluComm   *c);
void dflu_wait_recv(DFluComm *c, DFluUnpack *u);
void dflu_wait_send(DFluComm *c);

/* unpack */
void dflu_unpack(/**/ DFluUnpack *u);

The map is build from partlist, allowing the process to delete particles.

The solvent distribution also include cell list generation (see cell lists). It is done in two phases:

  • subindices, independant for bulk and remote particles

  • gather, which needs the subindices to be computed

Bulk subindices can be computed while comunicating remote particles.

/* cell lists */
void dflu_bulk(PartList lp, /**/ FluQuants *q);
void dflu_halo(const DFluUnpack *u, /**/ FluQuants *q);
void dflu_gather(int ndead, const DFluPack *p, const DFluUnpack *u, /**/ FluQuants *q);

14.3. rbc

Redistribute red blood cells accross nodes according to the center of mass of their bounding box (see minmax).

14.3.1. data structures

RBC distribution follows the common workflow described above and contains the three hidden generic data structures:

enum {
    MAX_NHBAGS = 2,
    MAX_NDBAGS = 1
};

struct DRbcPack {
    DMap map;
    float3 *minext, *maxext;
    dBags dbags[MAX_NDBAGS], *dpp;
    hBags hbags[MAX_NHBAGS], *hpp, *hii;
    int nbags;
    CommBuffer *hbuf;
    DMap hmap; /* host map for ids */

    int3 L; /* subdomain size */
};

struct DRbcComm {
    Comm *comm;
};

struct DRbcUnpack {
    hBags hbags[MAX_NHBAGS], *hpp, *hii;
    int nbags;
    CommBuffer *hbuf;

    int3 L; /* subdomain size */
};

where minext and maxext contain the bounding box informations of each RBC. Other variables have the same naming conventions than distr::flu.

14.3.2. interface

void drbc_pack_ini(bool ids, int3 L, int maxc, int nv, DRbcPack **p);
void drbc_comm_ini(MPI_Comm comm, /**/ DRbcComm **c);
void drbc_unpack_ini(bool ids, int3 L, int maxc, int nv, DRbcUnpack **u);

void drbc_pack_fin(DRbcPack *p);
void drbc_comm_fin(DRbcComm *c);
void drbc_unpack_fin(DRbcUnpack *u);

void drbc_build_map(int nc, int nv, const Particle *pp, DRbcPack *p);
void drbc_pack(const RbcQuants *q, /**/ DRbcPack *p);
void drbc_download(DRbcPack *p);

void drbc_post_recv(DRbcComm *c, DRbcUnpack *u);
void drbc_post_send(DRbcPack *p, DRbcComm *c);
void drbc_wait_recv(DRbcComm *c, DRbcUnpack *u);
void drbc_wait_send(DRbcComm *c);

void drbc_unpack_bulk(const DRbcPack *p, /**/ RbcQuants *q);
void drbc_unpack_halo(const DRbcUnpack *u, /**/ RbcQuants *q);

Note that unpack_bulk can be done before unpacking halo quantities.

14.4. rig

Redistribute rigid objects accross nodes according to their center of mass.

14.4.1. data structures

Rigid bodies distribution is very similar to the above. The hidden data structures fit again in the general workflow:

/*
 ipp: particles of the mesh
 ss:  "Rigid" structures
*/

enum {
    ID_PP,
    ID_SS,
    MAX_NBAGS
};

struct DRigPack {
    DMap map;
    dBags dbags[MAX_NBAGS], *dipp, *dss;
    hBags hbags[MAX_NBAGS], *hipp, *hss;
    CommBuffer *hbuf;
    int nbags;
    int3 L;  /* subdomain size */
};

struct DRigComm {
    Comm *comm;
};

struct DRigUnpack {
    hBags hbags[MAX_NBAGS], *hipp, *hss;
    CommBuffer *hbuf;
    int nbags;
    int3 L;  /* subdomain size */
};

14.4.2. interface

void drig_pack_ini(int3 L, int maxns, int nv, DRigPack **p);
void drig_comm_ini(MPI_Comm comm, /**/ DRigComm **c);
void drig_unpack_ini(int3 L, int maxns, int nv, DRigUnpack **u);

void drig_pack_fin(DRigPack *p);
void drig_comm_fin(DRigComm *c);
void drig_unpack_fin(DRigUnpack *u);

void drig_build_map(int ns, const Rigid *ss, /**/ DRigPack *p);
void drig_pack(int ns, int nv, const Rigid *ss, const Particle *ipp, /**/ DRigPack *p);
void drig_download(DRigPack *p);

void drig_post_recv(DRigComm *c, DRigUnpack *u);
void drig_post_send(DRigPack *p, DRigComm *c);
void drig_wait_recv(DRigComm *c, DRigUnpack *u);
void drig_wait_send(DRigComm *c);

void drig_unpack_bulk(const DRigPack *p, /**/ RigQuants *q);
void drig_unpack_halo(const DRigUnpack *u, /**/ RigQuants *q);

Note that unpack_bulk can be done before unpacking halo quantities.

14.5. common

helpers: common kernels

void dcommon_pack_pp_packets(int nc, int nv, const Particle *pp, DMap m, /**/ Sarray<Particle*, 27> buf); (1)
void dcommon_shift_one_frag(int3 L, int n, const int fid, /**/ Particle *pp); (2)
void dcommon_shift_halo(int3 L, int nhalo, const Sarray<int, 27> starts, /**/ Particle *pp); (3)
1 pack nc packets of nv particles into 27 buffers buf according to map
2 shift particles in the fragment direction
3 shift all particles according to the fragment direction

15. exch

Exchange quantities across nodes for "ghost particles".

A quantity is exchanged with a neighboring node if its position is within a given distance from the subdomain boundaries.

The workflow is very similar for every quantity to exchange:

  • build a map from the positions of the quatities and store it in a Map structure.

  • pack the data using Map in a Pack structure

  • communicaion: exchange packed data with neighbors, receive data into Unpack structure. This is done using the generic communicator.

  • Unpack the data from Unpack to quantities.

Optionally, other quantities can be exchanged back, e.g. forces.

15.1. map

Helper for packing data to send buffers. This is common to all quantities. As opposed to the distr module, the mapping is not one to one. A single quantity can be exchanged with up to 7 neighboring nodes if it is in a corner (exchange with 3 faces, 3 edges and one corner).

15.1.1. data structure

A single structure is able to build a map for up to nw objects (e.g. rbc and rigid give nw = 2)

enum {MAX_FRAGS=26};

struct EMap {
    int *counts;         /* number of entities leaving in each fragment */
    int *starts;         /* cumulative sum of the above                 */
    int *offsets;        /* offsets per fragment for each solute        */
    int *ids[MAX_FRAGS]; /* indices of leaving objects                  */
    int *cap;            /* capacity of ids                             */
};

for nw objects and nfrags fragments, the counts, starts and offsets arrays all have the same size n = (nw + 1) * (nfrags + 1) for convenience.

The nfrags + 1 part is for the starts, which is exclusive scan of counts in the fragment dimension. Therefore the last element is the sum of counts, i.e. number of entities for a given object.

The nw + 1 part is for offsets, which are an exclusive prefix sum of counts in the object id dimension: the last row is the number of entities of all objects leaving per fragment.

example: Given nw = 2 objects, with nfrags = 5 fragments, with the following counts:

object 0: 0 2 2 1 4
object 1: 1 1 0 0 3
-> counts = 0 2 2 1 4 _  1 1 0 0 3 _  _ _ _ _ _ _
-> starts = 0 0 2 4 5 9  0 1 1 1 1 4  _ _ _ _ _ _

where _ stands for "not set".

The offset array is then given by (we rewrite counts row by row for better understanding):

counts  = 0 2 2 1 4 _
          1 1 0 0 3 _
          _ _ _ _ _ _

offsets = 0 0 0 0 0 _
          0 2 2 1 4 _
	  1 3 2 1 7 _

starts and offsets are results of the scan operation (see below).

The index j used to retrieve the ith entity of a given object in a given fragment is then given by

j = offsets[oid][fid] + starts[oid][fid] + i

where oid is the object id, fid the fragment. note that above, [oid][fid] is implemented as [oid * (nfrags + 1) + fid].

The id is then ids[fid][j].

15.1.2. interface

Host interface:
void emap_ini(int nw, int nfrags, int cap[], /**/ EMap *map);               (1)
void emap_fin(int nfrags, EMap *map);                                       (2)
void emap_reini(int nw, int nfrags, /**/ EMap map);                         (3)
void emap_scan(int nw, int nfrags, /**/ EMap map);                          (4)
void emap_download_tot_counts(int nw, int nfrags, EMap map, /**/ int counts[]); (5)
void emap_download_all_counts(int nw, int nfrags, EMap map, /**/ int tot_counts[], int *counts[]);
1 allocate the map structure on device
2 deallocate the map structure
3 reset the map structure
4 scan the map structure to get starts and offsets
5 copy counts (total number of entities per fragment) from device to host
Device interface:
_I_ int emap_code(int3 L, const float r[3]) (1)
_I_ int emap_code_box(int3 L, float3 lo, float3 hi)  (2)
_I_ void emap_add(int nfrags, int soluteid, int pid, int fid, EMap m) (3)
_I_ int emap_decode(int code, /**/ int fids[MAX_DSTS]) (4)
1 get code (local fragment id) from position
2 get code (local fragment id) from box positions
3 add an entity id pid to the map in fragment fid and object soluteid. Does not add if the capacity is exceeded
4 get destination fragments from code

15.2. flu

Exchange solvent particles within a cutoff radius from the neighboring nodes.

15.2.1. interface

allocate, deallocate the structures:

void eflu_pack_ini(bool colors, int3 L, int maxd, EFluPack **p);
void eflu_comm_ini(MPI_Comm comm, /**/ EFluComm **c);
void eflu_unpack_ini(bool colors, int3 L, int maxd, EFluUnpack **u);

void eflu_pack_fin(EFluPack *p);
void eflu_comm_fin(EFluComm *c);
void eflu_unpack_fin(EFluUnpack *u);

build the map:

void eflu_compute_map(const int *start, const int *count, /**/ EFluPack *p);
void eflu_download_cell_starts(/**/ EFluPack *p);

pack and copy data on host:

void eflu_pack(const PaArray *parray, /**/ EFluPack *p);
void eflu_download_data(EFluPack *p);

communicate the packed data with neighbors:

void eflu_post_recv(EFluComm *c, EFluUnpack *u);
void eflu_post_send(EFluPack *p, EFluComm *c);
void eflu_wait_recv(EFluComm *c, EFluUnpack *u);
void eflu_wait_send(EFluComm *c);

void eflu_unpack(EFluUnpack *u);

using flu::LFrag26;
using flu::RFrag26;

void eflu_get_local_frags(const EFluPack *p, /**/ LFrag26 *lfrags);
void eflu_get_remote_frags(const EFluUnpack *u, /**/ RFrag26 *rfrags);

unpack and get data informations needed by the fluforces module:

void eflu_unpack(EFluUnpack *u);

using flu::LFrag26;
using flu::RFrag26;

void eflu_get_local_frags(const EFluPack *p, /**/ LFrag26 *lfrags);
void eflu_get_remote_frags(const EFluUnpack *u, /**/ RFrag26 *rfrags);

15.3. obj

Object exchanger is used to exchange particles from objects (rbc or rig) with neighboring nodes for computing their interactions with the solvent (fsi) or with other objects (contact forces). It sends the remote forces back to the local node.

Sending back the forces is optional.

15.3.1. particle exchange interface

allocate, deallocate the structures:

void eobj_pack_ini(int3 L, int nw, int maxd, int maxpsolid, EObjPack **p);
void eobj_comm_ini(MPI_Comm cart, /**/ EObjComm **c);
void eobj_unpack_ini(int3 L, int nw, int maxd, int maxpsolid, EObjUnpack **u);

void eobj_pack_fin(EObjPack *p);
void eobj_comm_fin(EObjComm *c);
void eobj_unpack_fin(EObjUnpack *u);

build the map from array of wrappers PaWrap containing array of particles:

void eobj_build_map(int nw, const PaWrap *ww, /**/ EObjPack *p);

pack the data:

void eobj_pack(int nw, const PaWrap *ww, /**/ EObjPack *p);
void eobj_download(int nw, EObjPack *p);

communicate the packed data with neighboring nodes:

void eobj_post_recv(EObjComm *c, EObjUnpack *u);
void eobj_post_send(EObjPack *p, EObjComm *c);
void eobj_wait_recv(EObjComm *c, EObjUnpack *u);
void eobj_wait_send(EObjComm *c);

retrieve information about the received data:

int26 eobj_get_counts(EObjUnpack *u); (1)
void  eobj_get_all_counts(int nw, EObjUnpack *u, int *all_counts);
Pap26 eobj_upload_shift(EObjUnpack *u); (2)
Fop26 eobj_reini_ff(const EObjUnpack *u, EObjPackF *pf); (3)
1 get the number of particles inside each fragment
2 upload particles on device, shift them in local coordinates and return device pointers
3 set forces on device to 0 and return device pointers

15.3.2. force back sender interface

allocate, deallocate the optional structures:

void eobj_packf_ini(int3 L, int maxd, int maxpsolid, EObjPackF **p);
void eobj_commf_ini(MPI_Comm cart, /**/ EObjCommF **c);
void eobj_unpackf_ini(int3 L, int maxd, int maxpsolid, EObjUnpackF **u);

void eobj_packf_fin(EObjPackF *p);
void eobj_commf_fin(EObjCommF *c);
void eobj_unpackf_fin(EObjUnpackF *u);

download the forces on host:

void eobj_download_ff(EObjPackF *p);

communicate back the forces to neighbours:

void eobj_post_recv_ff(EObjCommF *c, EObjUnpackF *u);
void eobj_post_send_ff(EObjPackF *p, EObjCommF *c);
void eobj_wait_recv_ff(EObjCommF *c, EObjUnpackF *u);
void eobj_wait_send_ff(EObjCommF *c);

unpack forces arrays inside wrappers FoWrap.

void eobj_unpack_ff(EObjUnpackF *u, const EObjPack *p, int nw, /**/ FoWrap *ww);

15.4. mesh

exchange full mesh particles with neighbouring nodes when bounding box of the mesh crosses or is close enough to the subdomain boundaries. This is used for mesh bounce back or solvent recoloring.

It can optionally send back momentum data per triangles to original node.

15.4.1. mesh exchanger interface

allocate, deallocate:

void emesh_pack_ini(int3 L, int nv, int max_mesh_num, EMeshPack **p);
void emesh_comm_ini(MPI_Comm comm, /**/ EMeshComm **c);
void emesh_unpack_ini(int3 L, int nv, int max_mesh_num, EMeshUnpack **u);

void emesh_pack_fin(EMeshPack *p);
void emesh_comm_fin(EMeshComm *c);
void emesh_unpack_fin(EMeshUnpack *u);

build map:

void emesh_build_map(int nm, int nv, const Particle *pp, /**/ EMeshPack *p);

pack data:

void emesh_pack_pp(int nv, const Particle *pp, /**/ EMeshPack *p);
void emesh_pack_rrcp(int nv, const Positioncp *rr, /**/ EMeshPack *p);
void emesh_download(EMeshPack *p);

communicate data with neigbours:

void emesh_post_recv(EMeshComm *c, EMeshUnpack *u);
void emesh_post_send(EMeshPack *p, EMeshComm *c);
void emesh_wait_recv(EMeshComm *c, EMeshUnpack *u);
void emesh_wait_send(EMeshComm *c);

unpack the data to a single particle array:

void emesh_unpack_pp(int nv, const EMeshUnpack *u, /**/ int *nmhalo, Particle *pp);
void emesh_unpack_rrcp(int nv, const EMeshUnpack *u, /**/ int *nmhalo, Positioncp *rr);

the returned value nmhalo is the number of received meshes

get number of mesh per fragment:

void emesh_get_num_frag_mesh(const EMeshUnpack *u, /**/ int cc[NFRAGS]);

15.4.2. back momentum sender interface

allocate, deallocate the optional structures:

void emesh_packm_ini(int num_mom_per_mesh, int max_mesh_num, EMeshPackM **p);
void emesh_commm_ini(MPI_Comm comm, /**/ EMeshCommM **c);
void emesh_unpackm_ini(int num_mom_per_mesh, int max_mesh_num, EMeshUnpackM **u);

void emesh_packm_fin(EMeshPackM *p);
void emesh_commm_fin(EMeshCommM *c);
void emesh_unpackm_fin(EMeshUnpackM *u);

pack the momentum infos

void emesh_packM(int nt, const int counts[NFRAGS], const Momentum *mm, /**/ EMeshPackM *p);
void emesh_downloadM(const int counts[NFRAGS], EMeshPackM *p);

communicate back the momentum infos:

void emesh_post_recv(EMeshCommM *c, EMeshUnpackM *u);
void emesh_post_send(EMeshPackM *p, EMeshCommM *c);
void emesh_wait_recv(EMeshCommM *c, EMeshUnpackM *u);
void emesh_wait_send(EMeshCommM *c);

unpack the momentum infos to a single array:

void emesh_upload(EMeshUnpackM *u);
void emesh_unpack_mom(int nt, const EMeshPack *p, const EMeshUnpackM *u, /**/ Momentum *mm);

15.5. common

helper for common exchange operations

void ecommon_pack_pp(const Particle *pp, PackHelper ph, /**/ Pap26 buf); (1)
void ecommon_shift_pp_one_frag(int3 L, int n, const int fid, /**/ Particle *pp); (2)
void ecommon_shift_rrcp_one_frag(int3 L, int n, const int fid, /**/ Positioncp *rr);
1 pack particles pp into 27 buffers buf according to the local map ph
2 shift particles in the fragment direction

The local map is defined through the structure

struct PackHelper {
    int *starts;
    int *offsets;
    int *indices[NFRAGS];
    int *cap;
};

which is a map for a single set of quantities.

16. flu: solvent particles

Manages solvent particles data.

16.1. data structure

The data is stored in a visible structure. The user can access directly the fields.

struct FluQuants {
    Particle *pp, *pp0;    /* particles on device  */
    int       n;           /* particle number      */
    Clist     *cells;      /* cell lists           */
    ClistMap *mcells;      /* cell lists map       */
    Particle *pp_hst;      /* particles on host    */

    /* optional data */

    bool ids, colors;

    int *ii, *ii0;  /* global ids on device */
    int *ii_hst;    /* global ids on host   */

    int *cc, *cc0;  /* colors on device */
    int *cc_hst;    /* colors on host   */

    int maxp; /* maximum particle number */
};

16.2. interface

allocate and deallocate the data:

void flu_ini(bool colors, bool ids, int3 L, int maxp, FluQuants *q);
void flu_fin(FluQuants *q);

generate the quantities:

void flu_gen_quants(const Coords*, int numdensity, const GenColor *gc, FluQuants *q); (1)
void flu_gen_ids(MPI_Comm, const int n, FluQuants *q); (2)
1 generate numberdensity particles randomly distributed in every cell with zero velocity. Colors are optionally set from GenColor object (see color).
2 generate unique global ids for each particle

restart interface (see restart):

void flu_strt_quants(MPI_Comm, const char *base, const int id, FluQuants *q);     (1)
void flu_strt_dump(MPI_Comm, const char *base, const int id, const FluQuants *q); (2)
1 read quantities from restart files
2 dump quantities to restart files

additional tools:

void flu_txt_dump(const Coords*, const FluQuants *q); (1)

/* build cells only from one array of particles fully contained in the domain */
/* warning: this will delete particles which are outside                      */
void flu_build_cells(/**/ FluQuants *q); (2)
1 dump particles to text format, one file per MPI rank (debug purpose)
2 build cells lists, assume that particles are inside subdomain

17. fluforces

solvent forces

17.1. interface

allocate and free bulk structure:

void fluforces_bulk_ini(int3 L, int maxp, /**/ FluForcesBulk **b);
void fluforces_bulk_fin(/**/ FluForcesBulk *b);

compute bulk interactions:

void fluforces_bulk_prepare(int n, const PaArray *a, /**/ FluForcesBulk *b);
void fluforces_bulk_apply(const PairParams*, int n, const FluForcesBulk *b, const int *start, const int *count,
                          /**/ const FoArray *ff);

allocate and free halo structure:

void fluforces_halo_ini(MPI_Comm cart, int3 L, /**/ FluForcesHalo **hd);
void fluforces_halo_fin(/**/ FluForcesHalo *h);

compute halo interactions:

void fluforces_halo_prepare(flu::LFrag26 lfrags, flu::RFrag26 rfrags, /**/ FluForcesHalo *h);
void fluforces_halo_apply(const PairParams*, const FluForcesHalo *h, /**/ const FoArray *farray);

17.2. submodules

17.2.1. bulk

compute local interactions of solvent particles, given cell starts, cell counts (see clist) and particles array

The algorithm is as follows:

  • each particle of the array has one thread assigned.

  • the thread runs over half of the 27 cell lists overlapping one cutoff radius (this makes 13 cells + the "self cell")

  • in each cell, the particle ids are fetched from the cell starts array

  • run over particles, atomic add the forces to them and gather forces on the main particle

  • atomic add main forces to force array

Performance is strongly related to the pattern used to run over the particles. Better performance is achieved by grouping as much as possible consecutive particles (row), meaning that the x cell index runs fastest.

The cell run order is shown below:

+----> x   plane dz = -1:    plane dz =  0:    plane dz = +1:
|
|            00 01 02          09 10 11          xx xx xx
v y          03 04 05          12 13 xx          xx xx xx
             06 07 08          xx xx xx          xx xx xx

where xx denotes that the cell is not used by the current thread.

17.2.2. halo

solvent forces with remote particles (fragments)

main kernel performs interactions for a given halo:

  • fetch generalized particle a

  • build map (how to access neighboring particles?)

  • loop over neighboring particles according to map

  • compute pairwise interactions

  • accumulate to force a in output force array

18. frag

Fragment functions on host and device

18.1. purpose

A fragment designates one of the 27 directions \((dx, dy, dz)\), where \(dx, dy, dz \in \{-1, 0, 1\}\).

The fragment \((0, 0, 0)\) is called bulk and has the id frag_bulk.

The purpose of this module is to encode and decode the fragments between fragment ids (numbered from 0 to 26) and directions.

18.2. interface

The major part of the functions are available to both device and host, within the namespaces fragdev and fraghst, respectively.

int i2dx(int i);                 (1)
int i2dy(int i);                 (2)
int i2dz(int i);                 (3)
void i2d3(int i, /**/ int d[3]); (4)
1 get \(dx\) from fragment id
2 get \(dy\) from fragment id
3 get \(dz\) from fragment id
4 get \((dx, dy, dz)\) from fragment id
int d2i(int x, int y, int z); (1)
int d32i(const int d[3]);     (2)
1 get fragment id from \((dx, dy, dz)\)
2 idem with array as input
int ncell(int3 L, int i); (1)
1 given sub-domain sizes L, returns number of neighboring cells to fragment with id i
int antid2i(int x, int y, int z); (1)
int anti(int i);                  (2)
1 given \((dx, dy, dz)\), return the id of the anti fragment (i.e. with direction \((-dx, -dy, -dz)\))
2 given a fragment id, returns the id of the anti fragment

Only on host:

void estimates(int3 L, int nfrags, float maxd, /**/ int *cap); (1)
1 given a maximum number density maxp, return an estimate of the number of particles in each fragment layer.

19. fsi : flow structure interactions

flow-structure interactions between solvent and objects (rigid objects, rbc membranes)

19.1. interface

allocate/deallocate

void fsi_ini(int rank, int3 L, /**/ Fsi **fsi);
void fsi_fin(Fsi *fsi);

computing forces:

void fsi_bind_solvent(PaArray pa, Force *ff, int n, const int *starts, /**/ Fsi *fsi);  (1)
void fsi_bulk(Fsi *fsi, int nw, const PairParams **prms, PaWrap *pw, FoWrap *fw); (2)
void fsi_halo(Fsi *fsi, int nw, const PairParams **prms, Pap26 all_pp, Fop26 all_ff, const int *all_counts); (3)
1 store solvent interactions inside hidden structure Fsi
2 compute the interactions between local objects and local solvent
3 compute the interactions between remote objects and local solvent

20. grid_sampler: particle to grid average

gather particle data on a grid and average in time. The particle to grid process is simple binning.

20.1. quantities

The number density and velocity (3 components) are computed by default. Optionally, the stress tensor (6 components) and the color proportions can be sampled as well.

All cell quantities computed at a given time \(t\) are averaged on time:

\[Q^i = \frac 1 T \sum\limits_{t = 1}^T Q^i_t\]

The cell quantities \(Q^i_t\) are averaged on space, depending on the quantity. Let \(j \in [1, n_i\)] denote the particle ids inside cell \(i\), where \(n_i\) is the number of particles inside this cell. \(V_i\) denotes the volume of the cell.

Velocity:

\[\mathbf{u}^i_t = \frac{1}{n_i} \sum\limits_{j = 1}^{n_i} \mathbf{u}_j\]

number density:

\[\rho^i_t = \frac{n_i}{V_i}\]

stress:

\[S^i_{\alpha, \beta} = \frac{1}{V_i} \sum\limits_{j = 1}^{n_i} S_j^{\alpha, \beta}\]

color proportion:

\[c^i_{\alpha} = \frac{1}{n_i} \sum\limits_{j = 1}^{n_i} \delta_{c_j, \alpha}\]

20.2. interface

Allocate and free the structure:

void grid_sampler_ini(bool colors, bool stress, int3 L, int3 N, GridSampler**); (1)
void grid_sampler_fin(GridSampler*); (2)
1 create a sampler for binning a subdomain of size L. N is the number of bins. Allocate space for stress when stress is set to true. Allocate space for the color proportions when colors is set to true.
2 deallocate the structure

The sampler object stores the grid data. All operations are performed on device:

void grid_sampler_reset(GridSampler*); (1)
void grid_sampler_add(const GridSampleData*, GridSampler*); (2)
void grid_sampler_dump(MPI_Comm, const char *dir, long id, GridSampler*); (3)
1 set all fields of the grids to 0
2 add contribution of the data (see grid sampler data) to the time averaged grid
3 perform time average of all gathered data, download grid on host and dump to file (see grid).

20.3. grid sampler data

helper to pass data to the grid sampler.

allocate and deallocate the data structure:

void grid_sampler_data_ini(GridSampleData **);
void grid_sampler_data_fin(GridSampleData  *);

pass data to the structure:

void grid_sampler_data_reset(GridSampleData *); (1)
void grid_sampler_data_push(long n, const Particle *pp, const int *cc,
                            const float *ss, GridSampleData *); (2)
1 remove all previously pushed data
2 add data (pointers only, not a deep copy) to the structure. The pp pointer is mandatory. The stress ss pointer is used only when stress is activated in grid sampler. The colors cc pointer is only used when colors is activated in the grid sampler.

21. IO

21.1. grid

Parallel-IO grid data dump using the hdf5 library and a xmf file description.

21.1.1. interface

The following function dumps grid data with subGrid points per node mapped to the subdomain coordinates of size subDomain. Two files DUMP_BASE/dir/id.xmf and DUMP_BASE/dir/id.h5 are dumped.

void grid_write(int3 subGrid, int3 subDomain, MPI_Comm, const char *path, int ncmp, const float **data, const char **names);
  • ncmp: number of variables

  • data: data to dump (structure of array fashion)

  • names: variable names

21.1.2. submodules

xmf writer, to be called by one rank only:

void xmf_write(int3 domainSize, int3 gridSize, const char *path, int ncomp, const char **names);

h5 writer, collective write operation:

void h5_write(int3 N, MPI_Comm, const char *path, int ncomp, const float **data, const char **names);

N is the grid size.

21.2. Low level MPI file write

int write_file_open(MPI_Comm, const char *path, /**/ WriteFile**);
int write_file_close(WriteFile*);

void write_all(MPI_Comm, const void *const, int nbytes, WriteFile*); (1)
int write_master(MPI_Comm, const void *const, int nbytes, WriteFile*); (2)

int write_shift_indices(MPI_Comm, int, /**/ int*);
int write_reduce(MPI_Comm, int, /**/ int*);
1 write nbytes from every prcesses
2 write nbytes onlye from master process (rank = 0)

21.3. Text

Write one row per particle in a text file path.

void txt_write_pp(long n, const Particle*, const char *path);
void txt_write_pp_ff(long n, const Particle*, const Force*, const char *path);

Read particles to a structure TxtRead. The TxtRead API returns the number of particles and pointers to the data. txt_read_fin frees the memory and invalidated pointers returned by txt_read_get_pp or txt_read_get_pp_ff.

void txt_read_pp(const char *path, TxtRead **);
void txt_read_pp_ff(const char *path, TxtRead **);
void txt_read_ff(const char *path, TxtRead **);
void txt_read_fin(TxtRead*);

int txt_read_get_n(const TxtRead *);
const Particle* txt_read_get_pp(const TxtRead *);
const Force*    txt_read_get_ff(const TxtRead *);

Text format is a as follow:

x y z vx vy vz [fx fy fz]
...
x y z vx vy vz [fx fy fz]

where every row corresponds to a particle.

21.4. Mesh

Read triangulated mesh from ascii off or ply files. Vertices coordinates are packed into array:

x0 y0 z0 x1 y1 z1 ... x[nv-1] y[nv-1] z[nv-1]

Triangles are packed in array of int4 with fields w left unused. For example, traingle i is

tt[i].x tt[i].y tt[i].z

mesh_get_nt, mesh_get_nv, mesh_get_ne, and mesh_get_md return the number of triangles, number of vertices, number of edges, and maximum degree.

void mesh_read_ini_off(const char *path, MeshRead**);
void mesh_read_ini_ply(const char *path, MeshRead**);
void mesh_read_fin(MeshRead*);

int mesh_read_get_nt(const MeshRead*);
int mesh_read_get_nv(const MeshRead*);
int mesh_read_get_ne(const MeshRead*);
int mesh_read_get_md(const MeshRead*);

const int4  *mesh_read_get_tri(const MeshRead*);
const float *mesh_read_get_vert(const MeshRead*);
const int4  *mesh_read_get_dih(MeshRead*);

21.5. Diagnostics

21.5.1. Particle

Uses only particles values. Dumps output to log files and to file path.

void diag_part_ini(const char *path, /**/ DiagPart**);
void diag_part_fin(DiagPart*);
void diag_part_apply(DiagPart*, MPI_Comm, float time, int n, const Particle*);

21.5.2. Mesh

Uses particle values and mesh.

21.6. Point

Write point data to bop files.

A configuration structure

void io_point_conf_ini(/**/ IOPointConf**);
void io_point_conf_fin(IOPointConf*);
void io_point_conf_push(IOPointConf*, const char *keys);

Push data

void io_point_ini(int maxn, const char *path, IOPointConf*, /**/ IOPoint**);
void io_point_fin(IOPoint*);
void io_point_push(IOPoint*, int n, const double *data, const char *keys);
void io_point_write(IOPoint*, MPI_Comm comm, int id);

Example

    enum {X, Y, Z};
    int i, id;
    double *r, rr[3*MAX_N], density[MAX_N];
    IOPointConf *c;
    IOPoint *p;
    for (i = 0; i < MAX_N; i++) {
        r = &rr[3*i];
        r[X] = i; r[Y] = 10*i; r[Z] = 100*i;
        density[i] = -i;
    }
    UC(io_point_conf_ini(&c));
    UC(io_point_conf_push(c, "x y z"));
    UC(io_point_conf_push(c, "density"));
    UC(io_point_ini(MAX_N, path, c, &p));
    UC(io_point_conf_fin(c));

    UC(io_point_push(p, MAX_N, rr, "x y z"));
    UC(io_point_push(p, MAX_N, density, "density"));
    id = 0;
    UC(io_point_write(p, comm, id));

21.7. VTK

Write vertices and triangles data to legacy vtk files. VTK structure uses VTKConf structure for initialization.

A configuration structure.

void vtk_conf_ini(MeshRead*, /**/ VTKConf**); (1)
void vtk_conf_fin(VTKConf*);
void vtk_conf_vert(VTKConf*, const char *keys); (2)
void vtk_conf_tri(VTKConf*, const char *keys); (3)
1 initilize configuration structure
2 "register" a "key" for data on vertices
3 "register" a "key" for data on triangles

Push data

void vtk_ini(MPI_Comm, int maxm, char const *dir, VTKConf*, /**/ VTK**); (1)
void vtk_points(VTK*, int nm, const Vectors*); (2)

void vtk_vert(VTK*, int nm, const Scalars*, const char *keys); (3)
void vtk_tri(VTK*, int nm, const Scalars*, const char *keys);  (4)

void vtk_write(VTK*, MPI_Comm, int id); (5)
void vtk_fin(VTK*);
1 initilize the structure, create directory
2 "push" coordinates of the vertices
3 "push" vertices data
4 "push" triangle data
5 write a file

21.8. restart

Dump and read restart files to restart simulation at a given time. Uses the bop file format.

21.8.1. naming

The restart files are named as:

BASE/CODE/ID.bop
BASE/CODE/ID.values
  • BASE: can be different for dump and read. Set to "strt" in default.cfg.

  • CODE: specified by the user, see code parameter in the interface. By convention, it corresponds to the quantity name followed by an additional specifier, such as "flu.pp" for particles of the solvent.

  • ID: a string specific to the time step. Common value is the dump index starting from 0. Special values are described in interface

21.8.2. interface

Writing a single file for all ranks:

void restart_write_pp(MPI_Comm, const char *base, const char *code, int id, long n, const Particle *pp); (1)
void restart_write_ii(MPI_Comm, const char *base, const char *code, int id, long n, const int *ii);      (2)
void restart_write_ss(MPI_Comm, const char *base, const char *code, int id, long n, const Rigid *ss);    (3)
void restart_write_stream_one_node(MPI_Comm, const char *base, const char *code, int id, const void *data, StreamWriter); (4)
1 write particle data
2 write integer data
3 write rigid data
4 the master rank only writes to a stream (text file)

The stream writer and reader make use of a function pointer:

typedef void (*StreamWriter)(const void*, FILE*);
typedef void (*StreamReader)(FILE*, void*);

where the data is passed through a void pointer.

The reader functions are symmetric to the writer:

void restart_read_pp(MPI_Comm, const char *base, const char *code, int id, int *n, Particle *pp);
void restart_read_ii(MPI_Comm, const char *base, const char *code, int id, int *n, int *ii);
void restart_read_ss(MPI_Comm, const char *base, const char *code, int id, int *n, Rigid *ss);
void restart_read_stream_one_node(const char *base, const char *code, int id, StreamReader, void *data);

The parameters are:

  • a communicator of all ranks dumping or reading the file

  • base: a string representing the base directory name

  • code: a string containing the directory name

  • id: the index of dump (starting at 0). Special values can be passed:

enum {RESTART_TEMPL=-1,
      RESTART_FINAL=-2};
enum {RESTART_BEGIN=RESTART_FINAL};

The final and begin special ids will replace the id part with the string "final". It is used to dump the last restart file. The templ special id will replace the id part with the string "templ". It is used for special quantities which do not change during the whole simulation, such as wall frozen particles or rigid frozen particles.

22. inter: interprocessing tools

Interprocessing tools: not pre or post processing, happens inside simulation…​

  • generate initial solvent colors

22.1. color

Generate colors initial conditions

22.1.1. interface

allocate, deallocate the helper structure:

void inter_color_ini(GenColor**);
void inter_color_fin(GenColor*);

set the geometry type

void inter_color_set_drop(float R, GenColor*); (1)
void inter_color_set_uniform(GenColor*); (2)
1 color with RED a ball of radius R in the center of the domain, and the rest BLUE
2 color all particles as BLUE

this can also be set via configuration file:

void inter_color_set_conf(const Config*, GenColor*);

Apply coloring to a set of particles host or device arrays:

void inter_color_apply_hst(const Coords*, const GenColor*, int n, const Particle *pp, /**/ int *cc);
void inter_color_apply_dev(const Coords*, const GenColor*, int n, const Particle *pp, /**/ int *cc);

23. Math

Helpers for math operations and random number generation

23.1. linal

Linear algebra functions

void linal_inv3x3(const float m0[6], /**/ float minv0[6]); (1)
1 inverse symmetric 3 by 3 matrix; fails for singular matrices with determinant smaller than epsilon (hardcoded to 1e-8).

structure of the matrix is xx, xy, yy, yz, zz, where

\[A = \begin{bmatrix} xx & xy & xz \\ xy & yy & yz \\ xz & yz & zz \end{bmatrix}\]

23.2. Host random number generation

Generate uniform random numbers on host in floating point precision. Uses KISS algorithm

struct RNDunif;                                             (1)
void rnd_ini(int x, int y, int z, int c, /**/ RNDunif **r); (2)
void rnd_fin(RNDunif *r);                                   (3)
float rnd_get(RNDunif *r);                                  (4)
1 hidden structure containing state of the rng
2 allocate and initialize the rng (x, y, z and c are states of the rng)
3 deallocate rng states
4 generate a random number in single precision; modify the state

23.3. tform: linar coordinate transformation for 3D vector

Transform vector a[3] to b[3] using

    b[X] = s[X]*a[X] + o[X]
    b[Y] = s[Y]*a[Y] + o[Y]
    b[Z] = s[Z]*a[Z] + o[Z]

Coefficient s and o are defined implicitly by API calls.

23.3.1. interface

Allocate/Deallocate
void tform_ini(Tform**);
void tform_fin(Tform*);

Set parameters of the transformation, some transfroms are for grids which are cell centered and defined by low and high boundaries and the number of cell.

void tform_vector(const float a0[3], const float a1[3],   const float b0[3], const float b1[3], /**/ Tform*); (1)
void tform_chain(const Tform *A, const Tform *B, /**/ Tform *C); (2)
void tform_to_grid(const float lo[3], const float hi[3], const int n[3], /**/ Tform*); (3)
void tform_from_grid(const float a0[3], const float b0[3], const int n[3], /**/ Tform*); (4)
void tform_grid2grid(const float f_lo[3], const float f_hi[3], const int f_n[3],
                     const float t_lo[3], const float t_hi[3], const int t_n[3],
                     Tform*); (5)
1 converts vectors a0 to a1 and b0 to b1
2 a chain of two transforms: C(x) is equivalent to 'B(A(x))'
3 transfrom to a grid: for example, it transfroms lo to -1/2 and hi to n - 1/2
4 transfrom from a grid: for example, it transfroms -1/2 to lo and n - 1/2 to hi
5 grid to grid transform
Convert vector
void tform_convert(const Tform*, const float a0[3], /**/ float a1[3]);
convert to view
void tform_to_view(const Tform*, /**/ Tform_v*);
log

Log or dump the state of 'Tform'

void tform_log(Tform*);
void tform_dump(Tform*, FILE*);

23.4. triangle functions

Host and device functions for a given triangle. For area computation a formula stable to numerical error is used.

double kahan_area0(double a, double b, double c); (1)
double kahan_area(const double a[3], const double b[3], const double c[3]); (2)
void   ac_bc_cross(const double a[3], const double b[3], const double c[3], /**/ double r[3]); (3)
double shewchuk_area(const double a[3], const double b[3], const double c[3]); (4)
double orient3d(const double a[3], const double b[3], const double c[3], const double d[3]); (5)
void dihedral_xy(const double a[3], const double b[3], const double c[3], const double d[3],
                   double *x, double *y); (6)
1 area of a triangle given sides
2 area of a triangle given coordinate of the vertices
3 \( \left(\mathbf{a} - \mathbf{c}\right) \times \left(\mathbf{b} - \mathbf{c}\right)\)
4 area of a triangle given coordinate of the vertices using "Shewchuk formula"
5 orient3d from Jonathan Richard Shewchuk
6 dihedral angle

24. Mesh

helpers for computation on triangle mesh.

Mesh are stored with an array of vertices (vv) or particles (pp) in an array of structure fashion. The connectivity is a set of vertices indices grouped by triangles. For performance and convenience reasons, they are stored in a int4 array, each element storing ids of the three vertices of the triangle (the last field is not used).

More complex connectivity structures are defined in appropriate submodules, see edg or adj.

24.1. area: compute area of a mesh

Compute the area of a triangulated mesh.

void mesh_area_ini(MeshRead*, /**/ MeshArea**); (1)
void mesh_area_fin(MeshArea*);

double mesh_area_apply0(MeshArea*, Vectors*);                     (2)
void  mesh_area_apply(MeshArea*, int nm, Vectors*, /**/ double*); (3)
1 in initialization triangles are stored
2 compute the area of one object
3 compute the areas of nm objects

24.2. tri_area: compute area of triangles in a mesh

Compute the area of every triangle of the mesh.

void mesh_tri_area_ini(MeshRead*, /**/ MeshTriArea**); (1)
void mesh_tri_area_fin(MeshTriArea*);

void  mesh_tri_area_apply(MeshTriArea*, int nm, Vectors*, /**/ double*); (2)
1 in initialization triangles are stored
2 compute areas

nm is the number of meshes.

24.3. angle: compute angles between triangles with a common edge

void mesh_angle_ini(MeshRead*, /**/ MeshAngle**); (1)
void mesh_angle_fin(MeshAngle*);
void  mesh_angle_apply(MeshAngle*, int nm, Vectors*, /**/ double*); (2)
1 in initialization triangles are stored
2 triangle for nm objects, the size of the output is nm * ne

24.4. edges length: compute length of the edges

void mesh_edg_len_ini(MeshRead*, MeshEdgLen**); (1)
void mesh_edg_len_fin(MeshEdgLen*);
void mesh_edg_len_apply(MeshEdgLen*, int nm, Vectors*, /**/ double*); (2)
1 in initialization triangles are stored
2 edges for nm objects, the size of the output is nm * ne

24.5. gen: generate mesh

Generate mesh vertices from template mesh and affine transform matrices (see matrices submodule).

/* vv : vertices of the mesh template : x y z x y z, ...   */
int  mesh_gen_from_file(const Coords*, const float *vv, const char *ic, int nv, /**/ Particle*); (1)
void mesh_gen_from_matrices(int nv, const float *vv, const Matrices*, /**/ int *pn, Particle*);  (2)
void mesh_shift(const Coords*, int n, /**/ Particle*); (3)
1 generate and filter meshes from file ic. Only those belonging to subdomain are kept.
2 generate from a matrices array
3 shift a set of vertices particles from global to local coordinates

24.6. matrices: affine transformation

An array of 4 x 4 affine transformation matrices.

\[\begin{bmatrix} \vec{y} \\ 1 \end{bmatrix} = \left[ \begin{array}{ccc|c} \, & A & & \vec{b} \ \\ 0 & \ldots & 0 & 1 \end{array} \right] \begin{bmatrix} \vec{x} \\ 1 \end{bmatrix}\]
void matrices_read(const char *path, /**/ Matrices**); (1)
void matrices_read_filter(const char *path, const Coords*, /**/ Matrices**); (2)

void matrices_get(const Matrices*, int i, /**/ double**); (3)
void matrices_get_r(const Matrices*, int i, /**/ double r[3]); (4)
int  matrices_get_n(const Matrices*); (5)
void matrices_log(const Matrices*);   (6)

void matrices_fin(Matrices*);
1 read from input file
2 read from input file and keep only matrices which "belong" to this MPI rank
3 return matrix as double[16]
4 return the shift of the ith matrix as double[3]
5 return the number of matrices
6 print the array of matrices to stderr using msg_print

24.7. "scatter": distribute scalar values between different elements of the mesh

void mesh_scatter_ini(MeshRead*, MeshScatter**); (1)
void mesh_scatter_fin(MeshScatter*);
void mesh_scatter_edg2vert(MeshScatter*, int nm, Scalars*, /**/ double*); (2)
void mesh_scatter_edg2vert_avg(MeshScatter*, int nm, Scalars*, /**/ double*); (2)
1 in initialization triangles are stored
2 edge to vertices

24.8. Triangles

Triangles on device. It is created from MeshRead*.

void triangles_ini(MeshRead*, /**/ Triangles**);
void triangles_fin(Triangles*);

the client must include type.h and can access fields directly.

struct Triangles {
    int4 *tt;
    int nt;
};

nt is the number of triangles.

24.9. vertices' area

Compute area associated with every vertices. Assume every adjusting triangle contributes one third it its area to a vertice.

void mesh_vert_area_ini(MeshRead*, /**/ MeshVertArea**); (1)
void mesh_vert_area_fin(MeshVertArea*);

void  mesh_vert_area_apply(MeshVertArea*, int nm, Vectors*, /**/ double*); (2)
1 in initialization triangles are stored
2 compute area

24.10. volume: compute volume of a mesh

Compute the volume enclosed by a triangulated mesh on host.

void mesh_volume_ini(MeshRead*, /**/ MeshVolume**); (1)
void mesh_volume_fin(MeshVolume*);

double mesh_volume_apply0(MeshVolume*, Vectors*);                      (2)
void  mesh_volume_apply(MeshVolume*, int nm, Vectors*, /**/ double*); (3)
1 in initialization triangles are stored
2 compute volume for one object
3 compute volume for nm objects

25. pair: pairwise interactions

pairwise forces. manages the parameters for dpd and repulsive Lennard-Jones parameters.

25.1. dpd interactions

DPD particles interact via the following force:

\[\mathbf{F}_{ij} = \left( F_{ij}^C + F_{ij}^R + F_{ij}^D \right) \mathbf{e}_{ij}\]

where

\[F_{ij}^C = \alpha w(r), \\ w(r) = \max \left( 1 - \frac{r_{ij}} {r_c}, 0 \right)\]
\[F_{ij}^R = \sigma w^R(r_{ij}) W_{ij}\]
\[F_{ij}^D = -\gamma w^R(r_{ij})^2 \mathbf{v}_{ij} \cdot \mathbf{r}_{ij}\]

with

\[\mathbf{r}_{ij} = \mathbf{r}_i - \mathbf{r}_j, \\ r_{ij} = | \mathbf{r}_{ij} |, \\ \mathbf{v}_{ij} = \mathbf{v}_i - \mathbf{v}_j.\]

The kernel stem:{w^R} is given by

\[w^R(r) = (1-r)^{spow},\]

where \(spow\) is a number between 0 and 1. The default value is 0.25.

25.2. repulsive LJ

Repulsive forces for contact are implemented using the repulsive LJ potential. It is given by

\[\mathbf{F}_{ij} = F_{ij}^{rLJ} \mathbf{e}_{ij}\]

where

\[F_{ij}^{rLJ} = \max \left( 0, F^{LJ}(r_{ij})\right)\]
\[F^{LJ}(r) = 24 \epsilon \frac{1}{r} \left( 2 \left[\frac \sigma r \right]^12 - \left[\frac \sigma r \right]^6 \right)\]

for numerical reasons, the magnitude is limited to \(10^4\).

25.3. Host interface

allocate, deallocate the structure:

void pair_ini(PairParams **);
void pair_fin(PairParams *);

set parameters:

void pair_set_dpd(int ncol, const float a[], const float g[], float spow, PairParams *p); (1)
void pair_set_lj(float sigma, float eps, PairParams *p); (2)
1 set the DPD parameters. ncol is the number of colors. This is mandatory for any view
2 set the Lennard-Jones parameters \(\sigma\) and \(\epsilon\). This may be called only when the LJ parameters are needed.

set parameters from configuration file:

void pair_set_conf(const Config *, const char *name_space, PairParams *);

the argument name_space denotes the namespace where to look for the parameters (see below).

Compute the \(\sigma\) parameter for a given temperature and timestep:

void pair_ini(PairParams **);
void pair_fin(PairParams *);

Get the different views:

void pair_get_view_dpd(const PairParams*, PairDPD*);
void pair_get_view_dpd_color(const PairParams*, PairDPDC*);
void pair_get_view_dpd_mirrored(const PairParams*, PairDPDCM*);
void pair_get_view_dpd_lj(const PairParams*, PairDPDLJ*);

25.4. device interface

Inside a CUDA kernel, the pairwise force can be called by the generic function

template <typename Param, typename Fo>
_I_ void pair_force(const Param *p, PairPa a, PairPa b, float rnd, /**/ Fo *f)

The PairPa structure is a generic particle with color. This should be used with the parray module. The Fo structure is a generic force and can be either PairFo or PairSFo, respectively, for force or force and stress computation. This should be used with the farray module. The input parameter Param p is a view for pair parameters.

The above function behaves differently given different types:

  • PairDPD: standard DPD interaction

  • PairDPDLJ: standard DPD and repulsive LJ interaction

  • PairDPDC: colored DPD interaction: pick up parameters as a function of the particle colors

  • PairDPDCM: mirrored DPD: same as colored DPD considering that particle b has the same color as particle a.

Inside a kernel, the generic force can be added by using

_I_ void pair_add(const PairFo *b, /**/ PairFo *a)
_I_ void pair_add(const PairSFo *b, /**/ PairSFo *a)

25.5. Configuration

parameters are configured through 3 namespaces: flu, fsi and cnt.

example:

flu = {
    dpd = true
    a = [10.0, 80.0, 10.0]
    g = [1.0, 8.0, 15.0]
    spow = 0.25
    lj = false
}

fsi = {
    dpd = true
    a = [10.0, 80.0, 10.0]
    g = [1.0, 8.0, 15.0]
    spow = 0.25
    lj = false
}

cnt = {
    dpd = true
    a = [0.0]
    g = [16.0]
    spow = 0.25
    lj = true
    ljs = 0.3
    lje = 0.44
}

The DPD parameters can be different when many colors are set. User provides parameters in an array following the packed symmetric matrix convention. For example the interaction table for 3 colors

\[\begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{12} & x_{22} & x_{23} \\ x_{13} & x_{23} & x_{33} \end{bmatrix}\]

becomes

\[\begin{bmatrix} x_{11} & x_{12} & x_{13} & x_{22} & x_{23} & x_{33} \end{bmatrix}.\]

26. meshbb

Bounce back particles on a moving mesh

Let define the following quantities:

  • \(\mathbf{r}_0, \mathbf{v}_0\) : particle at time \(0\)

  • \(\mathbf{r}_1, \mathbf{v}_1\) : particle at time \(dt\) if there is no collision

  • \(\mathbf{r}_n, \mathbf{v}_n\) : particle at time \(dt\) after bounce back

  • \(h \in [0, dt\)] : collision time

  • \(\mathbf{r}_w, \mathbf{v}_w\) : collision position and velocity of the surface

A particle colliding with a surface is bounced back as:

\[\mathbf{v}_n = 2 \mathbf{v}_w - \mathbf{v}_0, \\ \mathbf{r}_n = \mathbf{r}_w + (dt - h) \mathbf{v}_n.\]

This module computes the collision time, position and velocity for a moving triangle. The motion of the triangle points are assumed to be linear:

\[\mathbf{a}(t) = (1 - t) \mathbf{a}_0 + t \mathbf{a}_1, \\ \mathbf{b}(t) = (1 - t) \mathbf{b}_0 + t \mathbf{b}_1, \\ \mathbf{c}(t) = (1 - t) \mathbf{c}_0 + t \mathbf{c}_1.\]

26.1. interface

Allocate and deallocate MeshBB structure:

void mesh_bounce_ini(int maxpp, /**/ MeshBB **d);
void mesh_bounce_fin(/**/ MeshBB *d);

Bouncing particles on a set of mesh:

void mesh_bounce_reini(int n, /**/ MeshBB *d);  (1)
void mesh_bounce_find_collisions(float dt, int nm, MeshInfo meshinfo, const Positioncp *i_rr, int3 L,
                                 const int *starts, const int *counts, const Particle *pp, const Particle *pp0, /**/ MeshBB *d);  (2)
void mesh_bounce_select_collisions(int n, /**/ MeshBB *d);  (3)
void mesh_bounce_bounce(float dt, float mass, int n, const MeshBB *d, MeshInfo meshinfo,
                        const Positioncp *i_rr, const Particle *pp0, /**/ Particle *pp, Momentum *mm);  (4)
1 reinitialise the structure (must be done before a new time step)
2 store collision informations inside the structure with a set of mesh. This can be done for multiple mesh sets.
3 For each bouncing particle, select the collision with minimum time. Must be called after finding all collisions.
4 bounce particles and store the momentum changes per triangle

The momentum changes computed above can be collected to rigid objects or to membrane mesh:

void mesh_bounce_collect_rig_momentum(int ns, MeshInfo meshinfo, const Particle *pp, const Momentum *mm, /**/ Rigid *ss); (1)
void mesh_bounce_collect_rbc_momentum(float mass, int nc, MeshInfo mi, const Momentum *mm, /**/ Particle *pp);            (2)
1 Reduce Momentum array into one force and torque per rigid object.
2 Convert momentum information into forces for membrane particles.

27. Red blood cell

Red blood cell membranes are represented via particles with connectivity.

This module stores the required quantities and functions for:

  • initializing red blood cells positions and orientations

  • managing particles and connectivity

  • membrane force computations

27.1. Elastic energies

27.1.1. spring

Spring energy is a function of distance between connected beads \(r\)

\[U_{spring}(r) = U_{wlc}(r) + U_{pow}(r)\]

where worm-like chain energy:

\[U_{wlc}(r) = \frac{k_s}{4 l_{max}} \frac{3 x^2 - 2 x^3}{1 - x}, \quad x = \frac{r}{l_{max}}\]

and power-law energy:

\[U_{wlc}(r) = - \frac{k_p}{r^{m - 1}}\]

\(l_0\) is an equilibrium lenght, \(l_{max}\) is a maximum lenght \(l_{max} = l_{0} / x_{0}\), and \(k_p\) is defined from condition that \(U_{spring}\) has a maximum at \(r = l_0\), and \(m\) is a power factor.

27.1.2. local area constrain

For every triangle with area \(A^{loc}\) and an equilibrium area \(A_0^{loc}\)

\[E^{loc}_{area} = k_d \frac { \left( A^{loc} - A_0^{loc} \right)^2 } {2 A_0^{loc}}\]

27.1.3. total area constrain

For whole mesh with area \(A^{tot}\) and an equilibrium area \(A_0^{tot}\)

\[E^{tot}_{area} = k_a \frac {\left( A^{tot} - A_0^{tot} \right)^2} {2 A_0^{tot} }\]

27.1.4. total volume constrain

For whole mesh with volume \(V\) and an equilibrium volume \(V_0\)

\[E_{vol} = k_v \frac {\left( V - V_0 \right)^2} {2 V_0}\]

27.1.5. bending

\(\theta\) is an angle between triangles with common edge, \(\theta_0\) is an equilibrium angle.

\[E_{bnd} = k_b \left[ 1 - \cos \left( \theta - \theta_0 \right) \right]\]

27.2. Quantities

RbcQuants structure contains particles of membrane and connectivity.

struct RbcQuants {
    int n, nc;               /* number of particles, cells                */
    int nt, nv;              /* number of triangles and vertices per mesh */
    int md;                  /* maximum valence of the mesh               */
    Particle *pp, *pp_hst;   /* vertices particles on host and device     */
    bool ids;                /* global ids active ?                       */
    int *ii;                 /* global ids on host                        */
    AreaVolume *area_volume; /* to compute area and volume */
};

The above data structure is allocated and freed using:

void rbc_ini(long maxnc, bool ids, const MeshRead*, RbcQuants*);
void rbc_fin(RbcQuants*);

The generation is made in two stages: mesh generations and other quantities (global ids). This allows to remove red blood cells colliding with other entities before creating global ids.

void rbc_gen_mesh(const Coords*, MPI_Comm, const MeshRead*, const char *ic, /**/ RbcQuants*); (1)
void rbc_gen_freeze(MPI_Comm, /**/ RbcQuants*);                                               (2)
void rbc_strt_quants(MPI_Comm, const MeshRead*, const char *base, const char *name, const int id, RbcQuants*);  (3)
void rbc_strt_dump(MPI_Comm, const char *base, const char *name, int id, const RbcQuants *q); (4)
1 Generate mesh from file "ic" (see [mesh_gen])
2 Generate global ids of rbc
3 Generate quantities from restart
4 Dump a restart state of the quantities

Available mesh are stored in src/data/cells

27.3. Submodules

27.3.1. adj: mesh adjacency structure

Consider closed triangulated mesh with nv vertices. Degree of a vertex is the number of edges incident to the vertex. Let md is a maximum degree of the vertices. There are three types of forces: edge forces which depend on two vertices, triangle forces which depend on three vertices, and dihedral forces which depend of four vertices (edge and two adjusting triangles). To compute forces in one membrane uDeviceX launches nv * md threads, each thread adds forces to only one vertices. Adj_v has an API call which allows for a given thread ID to find which vertices should be used in forces computation.

static __device__ int adj_get_map(int i, Adj_v *adj, /**/ AdjMap *m) {

If md * nv is bigger than the number of edges in the membrane and some threads are not needed. The function returns 0 in this case. AdjMap is a structure which contains five indices and an ID of the mesh:

struct AdjMap {
    int i0, i1, i2, i3, i4;
    int rbc;
};

Each thread adds force to a vertices with index 0. It considers edge 01, triangle 012, and two dihedrals 0124 and 1203. For example, a contribution of edge force 01 to vertex 1 is added by another thread.

dih

Adj_v is initilized from a host structure Adj:

void adj_ini(int md, int nt, int nv, const int4 *tt, /**/ Adj**);
void adj_fin(Adj*);
int  adj_get_map(int i, const Adj*, /**/ AdjMap *m);
int  adj_get_max(const Adj*);
void adj_get_anti(const Adj*, /**/ int *anti);

void adj_view_ini(const Adj*, Adj_v**);
void adj_view_fin(Adj_v*);

Adj can be used to pack data which can be asseced on device:

float *A;
n = adj_get_max(adj);
EMALLOC(n, &A);
for (i = 0; i < n; i++) {
   valid = adj_get_map(i, adj, /**/ &m);
   if (!valid) continue;
   i0 = m.i0; i1 = m.i1; i2 = m.i2;
   A[i] = some_function_of(i0, i1, i2)
}

27.3.2. com: center of mass

Compute position and velocity of each mesh of the membrane.

void rbc_com_ini(int nv, int nm_max, /**/ RbcCom**);
void rbc_com_fin(/**/ RbcCom*);
void rbc_com_apply(RbcCom*, int nm, const Particle*, /**/ float3 **rr, float3 **vv);

max_nm is a maximum allowed number of meshes. It is a runtime error if nm exceeds nm_max.

27.3.3. Forces

Model

The membrane forces are computed by differentiating the membrane energy with respect to position:

spring

A force between two connected beads. \(l_0\) is an equilibrium spring lenght, \(l_{max}\) is a maximum spring lenght \(l_0 = l_0 / x_0\).

A force between neighboring consissts of repulsive worm-like-chain part

\[F_{wlc}(r) = \frac{k_{s}} {l_{max}} \frac{4 r^2 - 9 r + 6} {4 (r - 1)^2}\]

and attractive power law part

\[F_{pow}(r) = \frac{k_{pow}} {r^{m + 1}}\]
Other forces

(from rnd)

g = gammaC; T = kBT0
f0  = sqrtf(2*g*T/dt)*rnd

TODO

interface

Helpers are stored in RbcForce structure. Informations about stress free state and random forces are stored inside.

Allocate, deallocate structure:

void rbc_force_ini(const MeshRead*, RbcForce**);
void rbc_force_fin(RbcForce*);

The following setters must be called before force computation:

void rbc_force_set_stressful(int nt, float totArea, /**/ RbcForce*); (1)
void rbc_force_set_stressfree(const char *fname, /**/ RbcForce*);    (2)

void rbc_force_set_rnd0(RbcForce*);           (3)
void rbc_force_set_rnd1(int seed, RbcForce*); (4)
void rbc_force_set_bending(const MeshRead*, const char*, RbcForce*);
1 set stressful state: (no stress free shape)
2 set stressfree shape from file fname
3 disable random forces
4 enable random forces

Alternatively, user can set the above options through configuration file:

void rbc_force_set_conf(const MeshRead*, const Config *cfg, const char *name, RbcForce*);

The forces are computed by calling:

void rbc_force_apply(RbcForce*, const RbcParams*, float dt, const RbcQuants*, /**/ Force*); (1)
void rbc_force_stat(/**/ float *pArea, float *pVolume); (2)
1 compute forces
2 report statistics

27.3.4. force/area_volume: area and volume

Compute area and volume for every object.

void area_volume_ini(int nv, int nt, const int4 *tt, int max_cell, AreaVolume**); (1)
void area_volume_fin(AreaVolume*);

void area_volume_compute(AreaVolume*, int nm, const Particle*, /**/ float **av); (2)
void area_volume_host(AreaVolume*, /**/ float **av); (3)
1 initialization, nv, nt, tt: number of vertices, number of triangles, a triangle structure on host, max_nm is a maximum allowed number of meshes
2 compute area and volume for every mesh, nm is a number of meshes, output av is a devices array in the form area[0] volume[0] …​ area[nc-1] volume[nc-1]
3 after area_volume_compute call, area_volume_host returns a host version of av array

27.3.5. force/rnd: random generator

Device random number generator for random forces in rbc

allocate, deallocate structure:

void rbc_rnd_ini(int n, long seed, RbcRnd**);
void rbc_rnd_fin(RbcRnd*);

The seed can be set from timeby passing the special value:

enum {SEED_TIME = -1}; /* magic seed: ini from "time" */

Random number generation is clled as follows:

void rbc_rnd_gen(RbcRnd*, int n, /**/ float**); (1)
float rbc_rnd_get_hst(const RbcRnd*, int i);    (2)
1 generate n random numbers on device
2 get a single random number on host (debug purpose)

27.3.6. params: rbc parameters

Parameters for the membrane forces. Consists of two structures: RbcParams (hidden, on host) and RbcParams_v (view, to be passed to kernels).

RbcParams are allocated and freed with

void rbc_params_ini(RbcParams **);
void rbc_params_fin(RbcParams *);

The parameters can be set manually with:

void rbc_params_set_fluct(float gammaC, float gammaT, float kBT, RbcParams *);
void rbc_params_set_bending(float kb, float phi, RbcParams *);
void rbc_params_set_spring(float ks, float x0, float mpow , RbcParams *);
void rbc_params_set_area_volume(float ka, float kd, float kv, RbcParams *);
void rbc_params_set_tot_area_volume(float totArea, float totVolume, RbcParams *);

or via the configuration file:

void rbc_params_set_conf(const Config *c, const char *name, RbcParams *);

Parameters are accessible via getter functions:

double rbc_params_get_tot_volume(const RbcParams *);

The view structure can be created with:

RbcParams_v rbc_params_get_view(const RbcParams *);

27.3.7. shape

Helper for stress free state. Compute membrane quantities on host.

Allocate, deallocate structure:

void rbc_shape_ini(const Adj*, const float *rr, /**/ RbcShape**);
void rbc_shape_fin(/**/ RbcShape*);

where the positions rr are stored in a AoS fashion x0 y0 z0 x1 y1 z1 …​. Also needs adjacency structure as input.

Compute quantities (on host):

void rbc_shape_edg (RbcShape*, /**/ float**); (1)
void rbc_shape_area(RbcShape*, /**/ float**); (2)

void rbc_shape_total_area(RbcShape*, /**/ float*);    (3)
void rbc_shape_total_volume(RbcShape*, /**/ float*);  (4)
1 compute edge lengths (order: see Adj module
2 compute area of triangles
3 compute total area
4 compute total volume

27.3.8. stretch RBC

Apply a constant force to every vertex of every cell. The force is configured from the file path.

Format of the data file:

fx0 fy0 fz0
fx1 fy1 fz1
...
fx_nv fy_nv fz_nv

nv is the number of vertices.

void rbc_stretch_ini(const char *path, int nv, /**/ RbcStretch**); /* `nv` is for error check */
void rbc_stretch_fin(RbcStretch*);
void rbc_stretch_apply(int nm, const RbcStretch*, /**/ Force*);

28. rig: rigid objects

Rigid objects consist of frozen particles moving in a rigid fashion. Interactions with other objects are modeled via pairwise interactions with the frozen particles.

Rigid objects also contain a surface representation on which other particles may bounce back.

Rigid objects are numbered and need a unique id for dumping diagnosis.

28.1. Quantities

28.1.1. data structure

RigidQuants structure contains particles of rigid objects as well as mesh particles. Solid structures contain center of mass, orientation, angular and linear velocity of the rigid object.

struct RigQuants {
    int n, ns, nps;              /* number of particles (total), solid, particle per solid        */
    Particle *pp_hst, *pp;       /* particles on hst and device                                   */
    Rigid    *ss_hst, *ss;       /* rigid strutures                                               */
    float   *rr0_hst, *rr0;      /* frozen particle templates                                     */

    /* mesh related quantities */
    int nt, nv;                   /* number of [t]riangles and [v]ertices                          */
    int4 *dtt;                    /* triangle indices on device                                    */
    float *dvv;                   /* vertices on device (template)                                 */
    Particle *i_pp_hst, *i_pp;    /* particles representing all meshes of all solids of that node  */

    Rigid *ss_dmp;

    long maxp; /* maximum particle number */
};

28.1.2. Interface

allocate and deallocate the quantities:

void rig_ini(long maxs, long maxp, const MeshRead*, RigQuants*);
void rig_fin(RigQuants *q);

maxp is the maximum particle number per node.

Rigid bodies generation is performed in two stages: create the mesh, then create the rigid particle template, properties and global ids. This allows to remove rigid bodies colliding with other entities before the second phase. The interface for generation is:

void rig_gen_mesh(const Coords*, MPI_Comm, const MeshRead*, const char *ic, /**/ RigQuants *); (1)
void rig_gen_freeze(const Coords*, bool empty_pp, int numdensity, float rig_mass, const RigPinInfo*,
                    MPI_Comm, const MeshRead*, Particle *opp, int *on, RigQuants *q);          (2)
void rig_strt_quants(MPI_Comm, const MeshRead*, const char *base, const char *name, const int id, RigQuants *q); (3)
1 generate the meshes from file ic (see [mesh_gen])
2 generate particle template, global ids and properties of rigid objects given mesh. This operation will kill solvent particles inside the rigid objects
3 generate quantities from restart files

Restart files are dumped using the following functions (see restart):

void rig_strt_dump_templ(MPI_Comm, const char *base, const char *name, const RigQuants *q);         (1)
void rig_strt_dump(MPI_Comm, const char *base, const char *name, const int id, const RigQuants *q); (2)
1 to be called only once: dump the frozen particle in local referential (common to every rigid objects)
2 dump Solid structures, describing position, orientation and velocity of the rigid objects

28.2. update

Position and velocities of rigid objects are updated from forces exerted on the particles.

28.2.1. pin info

The hidden structure RigPinInfo is used to store information about constrained rigid motion. This can be used to disable the movement of the center of mass of the rigid object along \(x, y\) or \(z\) axis (see com parameter). User can also block the rotation along one of the axes (see axis parameter).

example:

  • rigid object can rotate but is only able to move along the \(x\) axis:

com = {0, 1, 1}
axis = {0, 0, 0}
  • rigid object can only rotate along the \(z\) axis and its center of mass is not able to move:

com = {1, 1, 1}
axis = {1, 1, 0}
void rig_pininfo_ini(RigPinInfo **); (1)
void rig_pininfo_fin(RigPinInfo *); (2)
void rig_pininfo_set(int3 com, int3 axis, RigPinInfo *); (3)
void rig_pininfo_set_pdir(int pdir, RigPinInfo *); (4)
void rig_pininfo_set_conf(const Config *cfg, const char *ns, RigPinInfo *); (5)

int rig_pininfo_get_pdir(const RigPinInfo *); (6)
1 allocate the hidden structure
2 deallocate the hidden structure
3 set the motion constrains
4 for periodic objects, set the direction of periodicity (e.g. cylinder)
5 set the above parameters from configuration file
6 get the perioid direction

The periodic direction can be 0, 1 or 2 for the \(x, y\) or \(z\) axis, respectively. Default value for no periodicty is set via

enum {NOT_PERIODIC = -1};

The configuration has the following syntax:

rig = {
    pin_com = [0, 0, 0]
    pin_axis = [0, 0, 0]
    pdir = -1
}

28.2.2. interface

naming:

  • ns: number of rigid objects

  • ss: array of rigid structures Solid

  • rr0: positions of the rigid particles in the rigid object reference frame

  • nps: number of rigid particle per rigid object

void rig_reinit_ft(const int ns, /**/ Rigid *ss); (1)
void rig_update(const RigPinInfo *pi, float dt, int n, const Force *ff, const float *rr0, int ns, /**/ Particle *pp, Rigid *ss); (2)
void rig_generate(int ns, const Rigid *ss, int nps, const float *rr0, /**/ Particle *pp); (3)
void rig_update_mesh(float dt, int ns, const Rigid *ss, int nv, const float *vv, /**/ Particle *pp); (4)
1 set the force and torque to 0
2 update particle and rigid objects positions and velocity
3 generate rigid particles from template positions and rigid structures
4 update the mesh position

28.3. Submodules

28.3.1. gen

helper to generate rigid particles from solvents and compute rigid properties

The solvent particles inside the given rigid objects are removed by this operation.

void rig_gen_from_solvent(const Coords*, MPI_Comm, RigGenInfo, /*io*/ FluInfo, /**/ RigInfo);

To avoid long argument list, the data is packed in helper structures:

struct RigGenInfo {
    float mass;
    int numdensity;
    const RigPinInfo *pi;
    int nt, nv;
    const int4 *tt;     /* on device */
    const Particle *pp; /* on device */
    bool empty_pp;
    const MeshRead *mesh;
};

struct FluInfo {
    Particle *pp;
    int *n;
};

/* everything on host */
struct RigInfo {
    int ns, *nps;
    float *rr0;
    Rigid *ss;
    Particle *pp;
};

29. scheme

time stepping scheme related modules

29.1. body forces

Force applied to every particles with active flag push. It is defined by two structures:

  • BForce, contains all informations needed for the force field. It is hold and maintained on host.

  • BForce_v (BForce view), containing only the information needed at a specific time. It is passed to device kernels.

29.1.1. interface

allocate/deallocate the structure:

void bforce_ini(BForce **p);
void bforce_fin(BForce *p);

initialize to a given force function:

void bforce_set_none(/**/ BForce *p);            (1)
void bforce_set_cste(float3 f, /**/ BForce *p);  (2)
void bforce_set_dp(float a, /**/ BForce *p);     (3)
void bforce_set_shear(float a, /**/ BForce *p);  (4)
void bforce_set_rol(float a, /**/ BForce *p);    (5)
void bforce_set_rad(float a, /**/ BForce *p);    (6)
1 no force
2 constant force f
3 double poiseuille
4 shear
5 4 roller mill
6 radial force

initialize from configuration:

void bforce_set_conf(const Config *cfg, /**/ BForce *bf);

interface:

void bforce_adjust(float3 f, /**/ BForce *fpar); (1)
void bforce_apply(const Coords *c, float mass, const BForce *bf, int n, const Particle *pp, /**/ Force *ff); (2)
1 change value of the force
2 apply body force to particles

29.1.2. configuration

none

no body force (default value)

bforce = {
    type = "none"
}
constant

constant force

bforce = {
    type = "constant"
    f    = [1.0, 0.0, 0.0]
}
double poiseuille

fx = y > yc ? a : -a

bforce = {
    type = "double_poiseuille"
    a    = 1.0
}
shear

fx = a * (y - yc)

bforce = {
    type = "shear"
    a    = 1.0
}
4 roller mill
    f.x =  2*sin(x)*cos(y) * a;
    f.y = -2*cos(x)*sin(y) * a;

while x and y are coordinates relative to the domain center and normalized to have x = Pi and x = -Pi at the domain edges.

bforce = {
    type = "four_roller"
    a    = 1.0
}
radial

fr = a * er / r, where fr is radial force, er is unit radial vector at position r and r is radial position.

bforce = {
    type = "rad"
    a    = 1.0
}

29.2. move

update particles positions and velocities given forces. Currently implemented:

  • Forward Euler

  • Velocity-Verlet

void scheme_move_apply(float dt, float mass, int n, const Force*, Particle*); (1)
void scheme_move_clear_vel(int n, Particle*); (2)
1 apply one step of the time scheme to the particles
2 set velocity of particles to 0

29.3. restrain

Restrain the movement of drop or rbc membrane by shifting center of mass velocity.

Consider a subset of particles \(P\) with indices \(I \subset \{i\}_{i = 1}^{n}\). The Restrain module shifts the center of mass velocity of \(P\) such that

\[\sum\limits_{i \in I} \mathbf{v}_i = 0.\]

The subset \(I\) depends on the colors of the solvent in case of a single drop.

29.3.1. interface

allocate and free host structure:

void scheme_restrain_ini(Restrain**);
void scheme_restrain_fin(Restrain*);

Set parameters:

void scheme_restrain_set_red(Restrain*); (1)
void scheme_restrain_set_rbc(Restrain*); (2)
void scheme_restrain_set_none(Restrain*); (3)
void scheme_restrain_set_freq(int freq, Restrain*); (4)
1 Set to restrain the "red" solvent only
2 Set to restrain the rbc membrane only
3 Set for no restrain
4 Set the diagnosis frequency (0 for no diagnosis)

Set parameters from configuration file:

void scheme_restrain_set_conf(const Config*, Restrain*);

Apply restrain (note: this calls collective mpi operation):

void scheme_restrain_apply(MPI_Comm comm, const int *cc, long it, /**/ Restrain *r, SchemeQQ qq);

29.3.2. configuration

restrain = {
    kind = ["none", "red", "rbc"]
    freq = 1000;
}

29.4. Time line

Manage simulation time.

Allocate, deallocate structure:

void time_line_ini(float start, /**/ TimeLine**);
void time_line_fin(TimeLine*);

Manipulate time:

void time_line_advance(float dt, TimeLine*);           (1)
int  time_line_cross(const TimeLine*, float interval); (2)
1 advance time by dt (should be called at every time step)
2 return true if there exist integer n such that n*interval is in between current and previous times, in other words has the simulation just crossed n*interval for some n.

Getters:

float time_line_get_current  (const TimeLine*); (1)
long  time_line_get_iteration(const TimeLine*); (2)
1 get current time
2 get iteration id

The time structure can also be initialised via restart mechanism:

void time_line_strt_dump(MPI_Comm, const char *base, int id, const TimeLine*); (1)
void time_line_strt_read(const char *base, int id, TimeLine*); (2)
1 dump time line state to a restart file in text format
2 read restart file to initialize time line state

29.5. Time step

Time step manager.

Two possible modes:

  • const: constant timestep

  • disp: adaptive time step based on the maximum acceleration of given particles

29.5.1. interface

allocate, deallocate the object:

void time_step_ini(const Config*, /**/ TimeStep**); (1)
void time_step_fin(TimeStep*);
1 The mode is initialised from configuration file. See Configuration for syntax.
float time_step_dt(TimeStep*, MPI_Comm, TimeStepAccel*); (1)
float time_step_dt0(TimeStep*);                          (2)
void time_step_log(TimeStep*);                           (3)
1 Compute and return current timestep depending on the mode. See acceleration manager for ImeStepAccel argument.
2 return previous timestep
3 print time step informations to screen

29.5.2. acceleration manager

Helper structure to pack multiple arrays of acceleration

allocate, deallocate the structure:

void time_step_accel_ini(/**/ TimeStepAccel**);
void time_step_accel_fin(TimeStepAccel*);

interface:

void time_step_accel_push(TimeStepAccel*, float m, int n, Force*); (1)
void time_step_accel_reset(TimeStepAccel*); (2)
1 add an array of forces. m is the mass of the particles.
2 remove all pushed arrays

29.5.3. Configuration

constant timestep dt = 1e-3 (no log):

time = {
     type = "const"
     dt = 1e-3
     screenlog = false
}

adaptive timestep with maximum dt = 1e-2 and maximum displacement dx = 0.1 (active log):

time = {
     type = "disp"
     dt = 1e-2
     dx = 0.1
     screenlog = true
}

30. sim: simulation

manages all modules together to run a simulation

30.1. interface

Allocate and deallocate the structure

void sim_ini(const Config*, MPI_Comm, /**/ Sim**);
void sim_fin(Sim*);

Run simulation:

void sim_gen(Sim*);  (1)
void sim_strt(Sim*); (2)
1 run from scratch: this will generate all quantities in two stages (solvent only, equilibrating; freezing stage, adding walls and solutes)
2 run from restart files

30.2. submodules

30.2.1. objects

Manages all solutes objects. This includes rigid objects and membranes.

Interface

Allocate, deallocate the structure:

void objects_ini(const Config*, const Opt*, MPI_Comm, const Coords*, int maxp, Objects**);
void objects_fin(Objects*);

Generate the quantities:

void objects_gen_mesh(Objects*);                           (1)
void objects_remove_from_wall(const Sdf *sdf, Objects *o); (2)
void objects_gen_freeze(PFarray*, Objects*);               (3)
1 Generate the meshes of all objects from the template mesh and an initial condition files (see [mesh_gen])
2 Remove all meshes colliding with walls
3 Build all remaining quantities after removal from walls. The rigid objects need solvent particles

The quantities can be instead generated from restart files:

void objects_restart(Objects*);

The state of the objects can be dumped using

void objects_mesh_dump(Objects*);                    (1)
void objects_diag_dump(float t, Objects*);           (2)
void objects_part_dump(long id, Objects*, IoBop*);   (3)

void objects_strt_templ(const char *base, Objects*);         (4)
void objects_strt_dump(const char *base, long id, Objects*); (5)
1 Dump all meshes of objects
2 Dump diagnosis. This include the solid objects dumps and the membrane centers of mass.
3 Dump rigid particles in bop format
4 Dump rigid templates to restart files
5 Dump current state to restart files

The objects are updated using

void objects_clear_vel(Objects*);         (1)
void objects_advance(float dt, Objects*); (2)
void objects_distribute(Objects*);        (3)
void objects_update_dpd_prms(float dt, float kBT, Objects*); (4)
1 set all velocities to 0
2 advance objects for one timestep; internal fast forces are computed there for time separation scheme of membranes
3 redistribute objects accross nodes
4 update dpd parameters given a new time step

The objects forces can be managed by calling

void objects_clear_forces(Objects*);                 (1)
void objects_body_forces(const BForce*, Objects *o); (2)
1 set all forces to 0
2 apply body forces to the objects

The internal membrane forces are computed inside advance (see above).

Other forces can be set via the getter functions (see link with objinter : object interactions):

bool objects_have_bounce(const Objects*);

void objects_get_particles_all(Objects*, PFarrays*);    (1)
void objects_get_particles_mbr(Objects*, PFarrays*);    (2)
void objects_get_accel(const Objects*, TimeStepAccel*); (3)

void objects_get_params_fsi(const Objects*, const PairParams*[]);           (4)
void objects_get_params_adhesion(const Objects*, const PairParams*[]);      (5)
void objects_get_params_repulsion(const Objects*, const WallRepulsePrm*[]); (6)
void objects_get_params_cnt(const Objects*, const PairParams***);           (7)
1 get particle infos of all objects
2 get particle infos of membranes
3 retrieve acceleration infos (see Time step)
4 get the pair parameters for fsi interactions (one per object, NULL for no interactions)
5 get the parameters for adhesion interactions with walls (one per object, NULL for no interactions)
6 get the parameters for repulsion with walls (one per object, NULL for no interactions)
7 get the pair parameters for cnt interactions (one per object pair, NULL for no interactions)
void objects_activate_mbr_bounce(Objects*);
void objects_bounce(float dt, float flu_mass, const Clist *flu_cells, long n, const Particle *flu_pp0, Particle *flu_pp, Objects *obj); (1)
void objects_recolor_flu(Objects*, PFarray *flu); (2)
double objects_mbr_tot_volume(const Objects*);    (3)
1 bounce solvent particles against objects
2 recolor solvent particles located inside membranes
3 compute the total volume occupied by all membranes

30.2.2. objinter : object interactions

Helper to manage interactions between objects (contact forces) and interactions with solvent particles (fsi forces).

Allocate, deallocate the structure:

void obj_inter_ini(const Config*, const Opt*, MPI_Comm, float dt, int maxp, /**/ ObjInter**);
void obj_inter_fin(ObjInter*);

Interface:

void obj_inter_forces(ObjInter*, const PairParams **fsi_prms, const PairParams **cnt_prms, PFarray *flu, const int *flu_start, PFarrays *obj); (1)
1 compute interactions (contact and fsi), bulk and halo

30.2.3. walls

Helper to manage solid walls.

Allocate, deallocate the structure:

void wall_ini(const Config*, int3 L, Wall**);
void wall_fin(Wall*);

Generate from solvent particles:

void wall_gen(MPI_Comm, const Coords*, OptParams, bool dump_sdf,
              /*io*/ int *n, Particle *pp, /**/ Wall*);

Alternatively, can be generated from restart files:

void wall_restart(MPI_Comm, const Coords*, OptParams, bool dump_sdf,
                  const char *base, /**/ Wall*);

The restart file is dumped by calling

void wall_dump_templ(const Wall*, MPI_Comm, const char *base);

Interactions with other quantities are possible with the functions

void wall_interact(const Coords*, const PairParams*[], Wall*, PFarrays*);
void wall_adhesion(const Coords*, const PairParams*[], Wall*, PFarrays*);
void wall_bounce(const Wall*, const Coords*, float dt, PFarrays*);
void wall_repulse(const Wall*, const WallRepulsePrm *[], PFarrays*);

The wall velocity can be updated in time with

void wall_update_vel(float t, Wall *);

Getter functions:

void wall_get_sdf_ptr(const Wall*, const Sdf**);           (1)
double wall_compute_volume(const Wall*, MPI_Comm, int3 L); (2)
1 get a pointer to sdf: signed distance function object
2 compute the volume inside the walls (where the solvent can be located). This is evaluated via Monte Carlo

31. sdf: signed distance function

The Signed Distance Function (SDF) \(S(x, y, z)\) is used to give the distance from a given position to the wall. The sign of the SDF field tells if the position is inside or outside the wall.

Conventions:

  • \(S(x,y,z) < -1\) : inside domain, far from wall

  • \(S(x,y,z) < 0\) : inside domain

  • \(S(x,y,z) = 0\) : wall interface

  • \(S(x,y,z) > 0\) : inside wall

  • \(S(x,y,z) > 1\) : inside wall, far from surface

Example of sdf function in 2D:

sdf

The above example denotes a circular wall of radius 2. The isosurface \(S(x,y,z) = 0\) is located at the wall interface.

31.1. interface

allocate, deallocate the structure:

void sdf_ini(int3 L, Sdf**);
void sdf_fin(Sdf*);

interface:

void sdf_gen(const Coords*, MPI_Comm cart, bool dump, Sdf*); (1)
void sdf_get_view(const Sdf*, /**/ Sdf_v*);                  (2)
1 generate the sdf field from file "sdf.dat"
2 get view to be passed to kernels

device interface:

_I_ float3 sdf_ugrad(const Sdf_v *texsdf, const float3 *r) (1)
_I_ bool sdf_far(const Sdf_v *sdf, float x, float y, float z) (2)
_I_ float sdf_eval(const Sdf_v *sdf0, float x, float y, float z)  (3)
1 returns gradient of sdf (normalised) at given position (finite difference)
2 returns true if position is far inside the domain (approximation)
3 returns the sdf value at given position (linear interpolation)

Bounce back from wall:

void sdf_bounce(float dt, const WvelStep*, const Coords*, const Sdf*, int n, /**/ Particle*);

Sorting tools:

void sdf_bulk_wall(const Sdf*, /*io*/ int *s_n, Particle *s_pp, /*o*/ int *w_n, Particle *w_pp); (1)
int  sdf_who_stays(const Sdf*, int n, const Particle*, int nc, int nv, /**/ int *stay);          (2)
double sdf_compute_volume(MPI_Comm, int3 L, const Sdf*, long nsamples);                          (3)
1 extract wall particles from solvent particles. Solvent particles which are not inside the domain are not kept.
2 label nc objects of nv particles to inside domain or not.
3 compute the volume between walls using monte carlo integration. nsamples is the number of samples per subdomain.

31.2. submodules

Some tasks of the SDF object are splitted into submodules:

31.2.1. array3D

a simple data holder for storing a 3D cuda array scalar field.

data structure
struct Array3d {
    cudaArray_t a;  (1)
    size_t x, y, z; (2)
};
1 data on device
2 grid size
interface
struct Array3d;
void array3d_ini(Array3d**, size_t x, size_t y, size_t z);                (1)
void array3d_fin(Array3d*);                                               (2)
void array3d_copy(size_t x, size_t y, size_t z, float *D, /**/ Array3d*); (3)
1 allocate the structure
2 deallocate the structure
3 copy data from host to cuda array on device

31.2.2. bounce

Particles crossing the wall surface are bounced back into the domain. We define:

  • \(\mathbf{r}_0, \mathbf{v}_0\) position and velocity before the particle has crossed the surface

  • \(\mathbf{r}_1, \mathbf{v}_1\) position and velocity after the particle has crossed the surface, not bounced back

  • \(\mathbf{r}_n, \mathbf{v}_n\) position and velocity after bounce-back

  • \(\mathbf{r}_w, \mathbf{v}_w\) position and velocity of the wall at collision point

The collision point is found by solving for the collision time \(h\)

\[S(\mathbf{r}_0 + h \mathbf{v}_0) = 0\]

with Newton iterations. We obtain \(\mathbf{r}_w = \mathbf{r}_0 + h \mathbf{v}_0\).

The bounced particle is set to

\[\mathbf{r}_n = \mathbf{r}_w + (dt - h) \mathbf{v}_n \\ \mathbf{v}_n = 2 \mathbf{v}_w - \mathbf{v}_0\]
interface
void bounce_back(float dt, const WvelStep *wv, const Coords *c, const Sdf *sdf, int n, /**/ Particle *pp);

31.2.3. field

read and manage a scalar field on host

interface

allocate, deallocate:

void field_ini(const char *path, /**/ Field**); (1)
void field_fin(Field*);                         (2)
1 allocate and read field from file "path"
2 deallocate the field structure

getter functions:

void field_size(const Field*, /**/ int N[3]);       (1)
void field_extend(const Field*, /**/ float ext[3]); (2)
void field_data(const Field*, /**/ float**);        (3)
1 get grid size
2 get extents of the grid
3 get pointer to data
void field_sample(const Field*, const Tform*, const int N[3], /**/ Field**); (1)
void field_dump(const Field*, const Coords*, MPI_Comm cart); (2)
void field_scale(Field*, float scale); (3)
1 sample field to a new grid size N
2 dump field to "wall.h5"
3 scale field by scale

31.2.4. label

a simple tool to mark particles in wall regions

labels can be:

    LABEL_BULK,  /* remains in bulk */
    LABEL_WALL,  /* becomes wall particle */
    LABEL_DEEP   /* deep inside the wall */
void wall_label_dev(const Sdf *sdf, int n, const Particle*, /**/ int *labels); (1)
void wall_label_hst(const Sdf *sdf, int n, const Particle*, /**/ int *labels); (2)
1 set labels on an array of particles on device
2 set labels on an array of particles on host

31.2.5. tex3D

a simple texture object manager for representing 3d grid scalar field

data structures
struct Tex3d_v { cudaTextureObject_t t; }; (1)
struct Tex3d   { cudaTextureObject_t t; }; (2)
1 view, to be passed on device
2 Host structure
interface

allocate and destroy texture object:

void tex3d_ini(Tex3d**);
void tex3d_fin(Tex3d*);

manipulate texture object:

void tex3d_copy(const Array3d*, /**/ Tex3d*);    (1)
void tex3d_to_view(const Tex3d*, /**/ Tex3d_v*); (2)
1 create texture object from Array3d object
2 create a view to be passed to kernels

31.2.6. tform

A simple set of linear transformations for sdf. Based on tform module.

/* T: texture size, N: sdf file size, M: wall margin */
void tex2sdf_ini(const Coords*, const int T[3], const int N[3], const int M[3], /**/ Tform*); (1)
void out2sdf_ini(const Coords*, const int N[3], /**/ Tform*); (2)
void sub2tex_ini(const Coords*, const int T[3], const int M[3], /**/ Tform*); (3)
1 transform from texture coordinates to sdf coordinates
2 transform from Glbal coordinates to sdf coordinates
3 transform from subdomain coordinates to texture coordinates

32. Struct: helper structures

Helper structures to pass particle data between modules.

32.1. partlist

Part list: a structure to handle a list of dead or alive particles Purpose: kill particles during redistribution or cell lists build

32.1.1. data structure

struct PartList {
    const Particle *pp;   /* particles array          */
    const int *deathlist; /* what pp is dead or alive */
};

32.1.2. interface

contains only device interface:

static __device__ bool is_dead(int i, const PartList lp) {
    if (lp.deathlist)
        return lp.deathlist[i] == DEAD;
    return false;
}

static __device__ bool is_alive(int i, const PartList lp) {
    return !is_dead(i, lp);
}

32.2. parray

Generic particle array to be passed to interaction kernels.

32.2.1. host interface

A PaArray structure can be configured by pushing particles and, optionally, colors:

void parray_push_pp(const Particle *pp, PaArray *a);
void parray_push_cc(const int *cc, PaArray *a);
Particles must be pushed before colors

Client can check if the colors has been pushed by calling

bool parray_is_colored(const PaArray *a);

The particle array data are meant to be passed to kernels. In order to inline the fetching mode (see device interface), different views can be generated from the PaArray structure:

void parray_get_view(const PaArray *a, PaArray_v *v);  (1)
void parray_get_view(const PaArray *a, PaCArray_v *v); (2)
1 Get a view to fetch particles without colors
2 Get a view to fetch particles with colors

32.2.2. device interface

Generic particle (see pair module) can be fetched depending on the view:

_I_ void parray_get(PaArray_v a, int i, /**/ PairPa *p) (1)
_I_ void parray_get(PaCArray_v a, int i, /**/ PairPa *p) (2)
1 fetch position and velocity only
2 fetch position, velocity and color

32.3. farray

Generic force array to be passed to interaction kernels. Implemented versions:

  • forces only

  • forces and stresses

32.3.1. host interface

A FoArray structure can be configured by pushing forces and, optionally, stresses:

void farray_push_ff(Force *ff, FoArray *a);
void farray_push_ss(float *ss, FoArray *a);
Forces must be pushed before stresses

Client can check if the stresses has been pushed by calling

bool farray_has_stress(const FoArray *a);

The force array data are meant to be passed to kernels. In order to inline the forces computations mode (see device interface), different views can be generated from the FoArray structure:

void farray_get_view(const FoArray *a, FoArray_v *v);  (1)
void farray_get_view(const FoArray *a, FoSArray_v *v); (2)
1 Get a view to compute forces only
2 Get a view to compute forces and stresses

32.3.2. device interface

Generic forces (see pair module) can be added to the generic force array depending on the view type:

template <int S>
_I_ void farray_add(const PairFo *f, int i, /**/ FoArray_v a) (1)
template <int S>
_I_ void farray_add(const PairSFo *f, int i, /**/ FoSArray_v a) (2)
1 add forces only to the array
2 add forces and stresse to the array

The template parameter S is the sign, must be either -1 or 1. Note that it only affects the forces, stresses are always added with positive sign, as opposed to forces.

For equivalent atomic operations, the following functions can be called:

template <int S>
_I_ void farray_atomic_add(const PairFo *f, int i, /**/ FoArray_v a)
template <int S>
_I_ void farray_atomic_add(const PairSFo *f, int i, /**/ FoSArray_v a)

The generic Force structure can be initialised to 0 based on the type of the Generic array:

_I_ PairFo farray_fo0(FoArray_v)
_I_ PairSFo farray_fo0(FoSArray_v)

33. utils

33.1. utility functions

a set of utility functions

33.1.1. general tools

the following functions are provided in the imp.h header file.

memory management functions:

void emalloc(size_t, /**/ void**); (1)
void efree(void*); (2)
void *ememcpy(void *dest, const void *src, size_t n); (3)
1 safe malloc: throws an error on failure
2 safe free
3 safe memcpy

convenient macros for the above functions:

#define EMALLOC(n, ppdata) UC(emalloc((n)*sizeof(**(ppdata)), (void**)(ppdata)))
#define EFREE(pdata) UC(efree(pdata))
#define EMEMCPY(n, src, dest) ememcpy((void*)(dest), (const void*)(src), (n)*sizeof(*(dest)))

safe versions of the stdio functions:

void efopen(const char *fname, const char *mode, /**/ FILE**);
void efclose(FILE*);
void efread(void *ptr, size_t size, size_t nmemb, FILE*);
void efwrite(const void *ptr, size_t size, size_t nmemb, FILE*);
void efgets(char *s, int size, FILE*);

additional tools:

bool same_str(const char *a, const char *b); (1)
const char* get_filename_ext(const char*);
1 returns true if the two strings are the same

33.1.2. msg: messages

a simple logging tool for many rank application. a message is printed in the stderr stream by the master rank (rank 0). all ranks also print the message in file .XXX, where XXX is the MPI rank id.

void msg_ini(int rank); (1)
void msg_print(const char *fmt, ...); (2)
1 initialise the logger
2 log message. format is the same as printf

33.1.3. os: operating system tasks

void os_mkdir(const char* path); (1)
long os_time();                  (2)
void os_srand(long int seedval); (3)
double os_drand();               (4)
void os_sleep(unsigned int seconds); (5)
void os_malloc_stats();              // TODO
1 create a directory
2 returns local time
3 seed for os_drand()
4 returns double random number between 0 and 1
5 sleep for seconds seconds

33.2. error handling

uDeviceX has a primitive error handling mechanism. Error is raised if

  • ERR(fmt, …​) is called. Syntax of ERR is the same as one of printf(3).

  • a function wrapped with CC returns a error code

  • a function wrapped with MC returns a error code

If error is raised the uDeviceX aborts and prints a error message with a function call trace. Only the function calls wrapped with UC go into trace. For example,

void c() { ERR("c failed"); }
void b() { UC(c()); }
void a() { UC(b()); }
void main () {
    a();
}

produces a log with source file name and line number of two UC calls and ERR call.

33.3. cc: cuda check

[C]uda [C]heck macro.

Error handling for cuda API calls. usage:

CC(my_cuda_api_call());

CC uses internally UdxError in case of failure. This allows to dump a backtrace (see [error]).

33.4. mc: mpi check

[M]PI [C]heck macro.

Error handling for MPI API calls. usage:

MC(my_mpi_api_call());

MC uses internally UdxError in case of failure. This allows to dump a backtrace (see [error]).

33.5. kl: kernel launch

[K]ernel [L]aunch macro.

a macro for cuds kernel launches

All but unsafe skip kernels with zero thread number.

Examples:

KL(fun, (i+1, j, k), ())     -> fun<<<i+1, j, k>>>()
KL(fun, (i+1, j, k), (a, b)) -> fun<<<i+1, j, k>>>(a, b)

33.6. texo: texture object

a simple wrapper for texture objects in cuda.

template<typename T>
struct Texo {
    cudaTextureObject_t d;
};

on host: create and destroy the texture object of a given type T:

void texo_setup(int n, T *data, /**/ Texo<T> *to)
void texo_destroy(/**/ Texo<T> *to)

on device: fetch from texture object:

const T texo_fetch(const Texo<T> t, const int i)

34. wall

Manages frozen wall particles data.

34.1. data structure

The data is stored in a visible structure. The user can access directly the fields.

struct WallQuants {
    float4 *pp; /* particle positions xyzo xyzo ... */
    int n;      /* number of particles              */
    int3 L;     /* subdomain size                   */
};

struct WallTicket;
struct WallRepulsePrm;

Only positions of the particles are stored, Their velocity can be computed by using the wvel submodule.

The Ticket hidden structure contains helpers (cell lists, texture objects, random number generator) for force interactions with external particles.

34.2. interface

allocate and deallocate the data:

void wall_ini_quants(int3 L, WallQuants*);
void wall_ini_ticket(int3 L, WallTicket**);

void wall_fin_quants(WallQuants*);
void wall_fin_ticket(WallTicket*);

generate the quantities:

void wall_gen_quants(MPI_Comm, int numdensity, int maxn, const Sdf*, /* io */ int *o_n, Particle *o_pp, /**/ WallQuants*); (1)
void wall_gen_ticket(const WallQuants*, WallTicket*); (2)
1 freeze the particles from a given equilibrated array of solvent particles occupying the subdomain. The input array of particles is resized and only solvent particles remain. Creates additional "margin particles" copied from neighbouring ranks. See exch.
2 generate Ticket structure from quantities

restart interface:

void wall_strt_quants(MPI_Comm, const char *base, int maxn, WallQuants*); (1)
void wall_strt_dump_templ(MPI_Comm, const char *base, const WallQuants*); (2)
1 read quantities from restart files
2 dump quantities to restart files (set 0 velocity to particles)

compute interactions with other particles (see force submodule):

void wall_force(const PairParams*, const WvelStep *, const Coords*, const Sdf*, const WallQuants*,
                const WallTicket*, int n, const PaArray*, const FoArray*); (1)

void wall_force_adhesion(const PairParams*, const WvelStep *, const Coords*, const Sdf*, const WallQuants*,
                         const WallTicket*, int n, const PaArray*, const FoArray*); (2)

void wall_repulse(const Sdf*, const WallRepulsePrm*, long n, const PaArray*, const FoArray*); (3)
1 compute pairwise interactions between wall particles and input particles
2 compute visco elastic interactions with wall particles within cutoff radius 1
3 compute field interaction depending on the sdf and input particles

The repulsion parameters are managed via the following interface:

void wall_repulse_prm_ini(float lambda, WallRepulsePrm**); (1)
void wall_repulse_prm_ini_conf(const Config*, const char *ns, WallRepulsePrm**); (2)
void wall_repulse_prm_fin(WallRepulsePrm*); (3)
1 allocate and initialise the parameters
2 allocate and initialise the parameters from config file
3 deallocate the parameters

34.3. submodules

34.3.1. exch

a simple module to exchange the "margin particles" between neighbouring ranks (on host). These particles are used in interactions and act as ghost particles.

interface:

void wall_exch_pp(MPI_Comm cart, int numdensity, int3 L, int maxn, /*io*/ Particle *pp, int *n);

maxn is the maximum number of particles that can be stored in the pp buffer.

34.3.2. force

compute interaction forces between frozen wall particles and other particles.

interface:

void wall_force_apply(const PairParams*, const WvelStep*, const Coords*, const PaArray*, int n, RNDunif*, WallForce,
                      /**/ const FoArray*); (1)

void wall_force_adhesion_apply(const PairParams*, const WvelStep*, const Coords*, const PaArray*, int n, RNDunif*, WallForce,
                               /**/ const FoArray*);

void wall_force_repulse(Sdf_v, WallRepulsePrm, long n, const PaArray*, const FoArray*); (2)
1 pairwise interactions: input particles with frozen particles
2 field interaction: input particles get force \(\mathbf{f}(s)\), where \(s\) is the sdf value at position of input particle.

The helper structure WallForce acts as a view to be passed to the kernel. It is defined as:

struct WallForce { /* local wall data */
    Sdf_v sdf_v;
    Texo<int> start;
    Texo<float4> pp;
    int n;
    int3 L;
};

The repulsive force \(\mathbf{f}\) is computed as:

\[\mathbf{f}(s) = - \mathbf{\nabla} s \; \phi(s),\]

where

\[\phi = \exp \left[ \lambda (s + 1) \right] - 1,\]

when \(s>-1\) and 0 otherwise. For \(\lambda = 1\):

repulsive wall

34.3.4. wvel

Wall velocity field. Used to set velocity to wall particles for wall interactions. Also used to get wall velocity for bounce back on walls

data structures

data structures are separated into 3 kinds of data structures:

  • Wvel (hidden), which contains all informations needed for the velocity field. It is hold and maintained on host.

  • WvelStep (hidden), containing only the information needed for one specific time step. It is generated from Wvel structure.

  • Views, passed to kernels. They are generated from the WvelStep structure.

interface

Interface is splitted between host and device

Host

Host code is responsible for initialisation of Wvel and convert Wvel to a view at a given timestep.

alloc, free structures:

void wvel_ini(Wvel **wv);
void wvel_fin(Wvel *wv);

void wvel_step_ini(WvelStep **wv);
void wvel_step_fin(WvelStep *wv);

Set velocity type from parameters:

void wvel_set_cste(float3 u, Wvel *vw);
void wvel_set_shear(float gdot, int vdir, int gdir, int half, Wvel *vw);
void wvel_set_shear_sin(float gdot, int vdir, int gdir, int half, float w, Wvel *vw);
void wvel_set_hs(float u, float h, Wvel *vw);
void wvel_set_timestep(float dt, Wvel *vw);

Set velocity type from configuration:

void wvel_set_conf(const Config *cfg, Wvel *vw);

Get step structure:

void wvel_get_step(float t, const Wvel *wv, /**/ WvelStep *);

Get view structures:

void wvel_get_view(const WvelStep *w, /**/ WvelCste_v *v);
void wvel_get_view(const WvelStep *w, /**/ WvelShear_v *v);
void wvel_get_view(const WvelStep *w, /**/ WvelHS_v *v);

The convertion to view structures can be dispatch using the type of the parameters:

enum {
    WALL_VEL_V_CSTE,
    WALL_VEL_V_SHEAR,
    WALL_VEL_V_HS,
};

which can be retrieved from the Step structure using

int wvel_get_type(const WvelStep *w);
Device

Velocity field can be retrieved, depending on the view structure, using

static __device__ void wvel(WvelCste_v p, Coords_v c, float3 r, /**/ float3 *v)
static __device__ void wvel(WvelShear_v p, Coords_v c, float3 r, /**/ float3 *v)
static __device__ void wvel(WvelHS_v p, Coords_v c, float3 r, /**/ float3 *v)

Bounce function: gives bounced-back velocity from position and velocity of particle.

template <typename Wvel_v>
static __device__ void bounce_vel(Wvel_v wv, Coords_v c, float3 rw, /* io */ float3* v)
Configuration
constant

constant velocity \(u = (1, 0, 0)\):

wvel = {
    type = "constant"
    u    = [1.0, 0.0, 0.0]
}
shear

shear velocity \(u = (3 y, 0, 0)\):

wvel = {
    type = "shear"
    gdot = 3.0
    vdir = 0
    gdir = 1
    half = 0
}
sinusoidal shear

shear velocity \(u(t) = (3 y \sin(5 t), 0, 0)\):

wvel = {
    type = "shear sin"
    gdot = 3.0
    vdir = 0
    gdir = 1
    w = 5.0
}
Hele Shaw

velocity \(u_r = \frac u r \left(1 - \frac {4 z^2}{h^2}\right)\):

wvel = {
    type = "hele shaw"
    u = 1.0
    h = 8.0
}

35. third party dependencies

35.2. MPI library

To get the flags

mpicxx -show

or

pkg-config --libs mpich

35.3. libconfig

35.3.1. from source

v=1.7.1
wget https://hyperrealm.github.io/libconfig/dist/libconfig-$v.tar.gz
tar zxvf libconfig-$v.tar.gz
cd libconfig-$v
./configure --prefix=${HOME}/prefix/libconfig --disable-cxx
make -j
make install

35.3.2. from git

git clone git@github.com:hyperrealm/libconfig.git
cd libconfig
autoreconf
./configure --prefix=${HOME}/prefix/libconfig --disable-cxx
make -j
make install

35.3.3. from apt

sudo apt install libconfig-dev

35.3.4. pkgconfig

add path for pkg config

export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$HOME/prefix/libconfig/lib/pkgconfig

35.4. hdf5

a data model, library, and file format for storing and managing data

To get the compilation and linking flags

h5c++ -show

or

pkg-config hdf5-mpich --libs

To build from the source

Configuration options

      ./configure --prefix=$HOME/prefix/hdf5 --enable-parallel CXX=mpic++ CC=mpicc

35.5. bop

a simple format for particle data, see bop.

installation:

git clone git@github.com:amlucas/bop.git
cd bop
./configure [OPTIONS] (1)
make
make -s test
1 see ./configure --help

36. wrappers

36.1. intro

u.run respects several environment variables to modify the way udx is run

36.2. MEM

if MEM is set udx is ran with cuda-memcheck and MEM is used as a list of parameters

MEM= u.test test/*
MEM='--leakcheck --blocking'              u.test test/*
MEM='--tool initcheck'                    u.test test/*

see also poc/memcheck

36.3. VAL

if VAL is set udx is ran with valgrind and VAL is used as a list of parameters.

VAL= u.test test/*
VAL="--leak-check=full --show-leak-kinds=all"  u.test test/*

u.run respect DRYRUN : only show the commands do not execute them

DRYRUN= VAL=--option u.run ./udx

module: load cray-hdf5-parallel cudatoolkit daint-gpu
cmd: srun -n 1 -u valgrind --option ./udx

36.4. PROF

if PROF is set is ran with nvprof

PROF="--export-profile main.prof" u.run ./udx
nvprof -i main.prof

See also poc/ppbandwidth

36.5. TIM

if TIM is set is ran with time