yao::news

HOME | INSTALL | MANUAL | NEWS

February 11, 2011

yao @ github

I have moved from svn to git. Yao is now at github (https://github.com/frigaut/yao). To check out (read only):

git clone git@github.com:frigaut/yao.git yao

November 11, 2010

Pyramid WFS back in yao 4.8

In the early times (version < 2.0), pyramid WFS were once available in yao, but I did not follow up on the code and I had to remove the feature. They are now back in yao 4.8.0, thanks to another contribution from Marcos van Dam, with some tweaking from myself. The speed is reasonable, with approximately 70 iterations/sec for a 8x8 system. Check it out with test5.par in the example directory. Grab it at the usual place.

July 2, 2010

yao going after ELT-size systems

The new parallel features (see next post), and especially the new sparse reconstructor options (see 2 posts below) makes running very large systems much more tractable than before. A 100x100 Shack-Hartmann system (on a 400 pixels pupil) runs at 8 iterations/seconds on an 8 core Xeon-based machine @ 3.33GHz! So one can get statiscally meaningful results in a few mn. It's not too greedy in RAM either (in modern machine standard) at 120MB for the main process (our machine has 8GB of RAM).

A 64x64 system runs at 23 iterations/s and a 200x200 SHWFS at 1 iterations/s (note that the later starts to be in the realm of ELT extreme AO, at 15cm actuator pitch on a 30-m).

July 2, 2010

Parallel yao released with improvements!

More work on yao parallel:

I have merged some of the -useful- parallel features implemented in our Gemini local yao_mcao to the yao trunk. Namely,
- The DM/WFS parallelization (meaning, taking advantage of the fact that real systems have at least one frame delay (remember that even if the command is applied immediately after the measurements are received, there is still one frame delay due to the zero order hold). The one frame delay means the WFSing at iteration N is not needed to compute the command at iteration N, thus can be computed in parallel.
- The PSF calculations, which is a no brainer.
I have implemented the parallelization of individual SHWFS. Meaning, an individual SHWFS can now run in parallel on an arbitrary number of "threads" (in fact, forks). This works only for Shack-Hartmann WFS. The good thing is that this is implemented fairly deep into the code, so any call to sh_wfs() will benefit from this, which means the iMat acquisition is accelerated too.

Depending on your system, i.e. WFS order, number of WFSs/DMs, whether you use dm.elt or not, sparse reconstructor or not, and more importantly, on the machine you run yao on, using the parallel features can make you gain anywhere between 20% and 200% in execution speed. If you want to use it, you will most probably have to experiment a bit to see what option should be set. Guidelines are given in the manual. The bottom line is that you should not use more forks (processes) than you have CPUs, otherwise swapping processes in/out of L2 cache introduce an overhead which cancels the paralleization advantage. With my core2duo laptop, I can typically gain 30 to 50%. With an 8 cores Xeon-based machine we have at work, I gain typically close to a factor of 2 (can be more or less). This does not seem particularly impressive, but one has to consider that a yao loop is made of many different individual operations: WFS, cMat product to retrieve the error, DM shape computation from the DM commands and the influence functions, PSF calculation, etc... So a gain in one of this items is only going to benefit the whole thing marginally. As far as the individual WFS parallelization is concerned, here are some hard numbers:

8 Core Xeon (2 boards) @ 3.33GHz, running RHEL 5, 32 bits. yorick/yao built with -O2.
16x16 SHWFS system, physical model, no noise. All times (avg & median) in milliseconds for one frame.

sh16x16_svipc:        1 threads, sh_wfs() avg=2.86, median=2.90
sh16x16_svipc:        2 threads, sh_wfs() avg=1.87, median=1.89
sh16x16_svipc:        4 threads, sh_wfs() avg=1.37, median=1.37
sh16x16_svipc:        8 threads, sh_wfs() avg=1.14, median=1.15
sh16x16_svipc:       16 threads, sh_wfs() avg=2.33, median=2.30

You can see that the speed increases up to 8 forks (max gain = 2.5x). Adding more forks makes the thing slower. The gain does not reach larger value just because (1) there are overheads associated to the use of shared memory/semaphores and (2) there is also a global overhead in the sh_wfs() function. What's parallelize is the core of sh_wfs() (FFTs, noise application and slopes calculations from spots), but any overhead within the sh_wfs() function itself, before calling the FFT/slopes calculation routine is not going to gain from the parallelization.

The gain brought by the other, more general parallel features (DM/WFS & PSFs) is nicely illustrated by the following examples. This was run on the same machine (8 cores Xeon), with curvature.par (36 subapertures curvature system ran on a 120 pixels pupil). Look at the following results (I have omitted the timing results for items that are not of concerned here and were negligible):

sim.svipc=0 (no parallelization)
time(1-2) =  9.89 ms  (WF sensing)
time(3-4) =  0.01 ms  (cMat multiplication)
time(4-5) =  1.04 ms  (DM shape computation)
time(5-6) =  5.84 ms  (Target PSFs estimation)
59.18 iterations/second in average


sim.svipc=1 (WFS/DM parallelized)
time(1-2) =  4.11 ms  (WF sensing)
time(3-4) =  0.01 ms  (cMat multiplication)
time(4-5) =  0.94 ms  (DM shape computation)
time(5-6) =  5.78 ms  (Target PSFs estimation)
91.43 iterations/second in average

sim.svipc=2 (PSFs parallelized)
time(1-2) = 10.69 ms  (WF sensing)
time(3-4) =  0.01 ms  (cMat multiplication)
time(4-5) =  1.05 ms  (DM shape computation)
time(5-6) =  1.23 ms  (Target PSFs estimation)
76.41 iterations/second in average

That's a typical example. The "no parallel" case is dominated by WFSing (10ms) and PSF (6ms) calculations. When parallelizing the WFS (sim.svipc=1), the WFSing time goes down to 4ms. Indeed, now 6ms of it can be done in parallel with other tasks: here, the PSF calculation that takes 6ms. With sim.svipc=2 (only the PSF are parallelized), the PSF calculation time goes to basically to 0 while the WFSing stays at 10ms. It all makes sense. I haven't put the sim.svipc=3 results as they are a little more difficult to understand and would just introduce confusion.

The individual WFS parallel feature is controlled by the parameter wfs.svipc. The "global" DMs/WFSs abd PSF parallelization is controlled by sim.svipc (bit 0 controls DM/WFS and bit 1 controls PSFs, thus sim.svipc=3 means both are turned on).

Of course, if you want to use yao parallel facilities, you will need to install the yorick-svipc plugin (contributed by Matthieu Bec), through most of the normal channels. It runs on Linux (extensively tested) and OsX (somewhat tested).

July 2, 2010

New features contributed by users

Marcos van Dam (flatwavefronts.com), who is using yao to simulate the GMT AO system, has contributed a patch that add MMSE control matrices, with a couple of regularization (identity or laplacian) and a sparse option. The sparse makes it blazingly fast (both the inversion and the reconstruction show more than an order of magnitude speed gain).

Aurea Garcia-Rissmann (UofA), adapting code developed Mark Milton (at the time at UofA), contributed a new modal basis for the DMs: the disk harmonics. These represent well the kind of deformation produced by deformable secondaries, for instance. They are much better behaved than the Zernike at the edge.

Thanks Marcos and Aurea. The code was very clean, hence -relatively- easy to merge.

April 9, 2010

Performance Boost with Parallelization

Following the development here at Gemini of a svipc plugin (by Matthieu Bec), providing shared memory capabilities for yorick, I have been working on parallelizing yao. The results so far are encouraging. For instance, I get a 4x performance boost on simulation of our MCAO system here, from 12 it/s to 47 it/s on an 8 core Xeon machine. This will of course work better for systems with multiple WFS and/or DMs, but will probably also bring some improvements for simpler systems.

Right now, I have implemented this in our version of yao that we forked to accomodate our special MCAO needs. But the functions are rather generic and I do not foresee major hurdles in porting that to the main yao tree. I just need to get a few free hours... So stay tuned!

November 10, 2008

Lots of changes in 4.5.0: new features and bug fixes

I have come to develop yao once more. Version 4.5.0 brings a lot of new features, bug fixes and code re-organization. Here is the changelog:

Upgrades:

Added Karhuenen-Loeve DM
Added Tip-Tilt vibration spectrum
Now accept any odd/even combination # of subapertures/pixels per subapertures in focal plane (SHWFS)
Added proper handling of overlap between spots in focal plane (SHWFS)
Added a new graphical configuration function to display SHWFS config
Implemented field stops (SHWFS)
Implemented leaky integrator and up to 10th order filter
Implemented non contiguous/arbitrary pupil
Added different pupil for WFS and imager
Implemented segmented pupil
Implemented segmented DM
Added a number of user functions, that are called if they are defined (user_pupil, user_loop_err, user_loop_command)
Added the possibility for the user to define his/her own WFS and DM
Cleanup the code somewhat:
- Created yao_wfs.i and relocated all wfs related functions
- Created yao_dm.i and relocated all dm related functions
- Created yao_structures.i and relocated all struct definitions
- Changed all function names from old style (with capitals, e.g. checkParameters) to new style (no capital, but underscores, e.g. check_parameters). To keep the API compatibility with existing user code, I have copied the function names into the old names (see API compatibility section at the end of yao.i).
- Beside that, there shouldn't be any API change (in the sense that it should be compatible with old API, I have added a couple of keywords here and there). Most importantly, 4.5 should still be completely compatible with your old parfiles. However, it is not compatible with binary configurations saved with previous versions (the *imat.fits and other fits files). This means that you will have to remember to re-run aoinit with option clean=1 to re-create these binary config files.
- One worthy API change is the addition of the keyword all= to the go function (see doc), that you should now use in scripts/batches.
Slow down imat display (use sleep=value in ms)
Added flip of DM and new stackarray influence functional forms
Added transversal shift (misregistration) of WFS (SHWFS)
Added transversal shift (misregistration) of DM (already existed, but this is for large integer offsets)

Bug Fixes:

"Desired" pixel size is wrong (SHWFS)
Bug with extrapolated counted several time (clean=1)
Solved zernike influence function for altitude DM
Merged many new things from yao_mcao
Can now use dm.elt with aniso modes
Skymag not set to zero when doing imat
Fixed bug in which user would be stuck at prompt when doing interactive SVD
DispImImav reverse behavior from expected
Setting npixpersub is not taken into account system wide
Fixed a small bug in phase screen generation (max offseted (1,1) in power spectrum)
Fixed mess in actuator/subaperture order. now both go from bottom left to top right, in line (i.e. subap/act#2 is at right of subap/act#1)
Updated many help document section (yao.i, yao_wfs.i)

So, as you see, lots of new things. I apologize to those who knew well the yao code and for who this release is going to mean some work to find where the functions have gone, but I think the code re-organization was way overdue, and the results is cleaner. Many of these modifications were the results of a visit from Michael Hart (and some push from Marcos van Dam) to use yao for GMT simulations. As usual, I have benefited from many discussions, advices and code contribution from Damien Gratadour, Benoit Neichel, Yann Clenet and Eric Gendron.

Things I have missed in this release but that I firmly intend to implement in the next version:

Smarter reconstructors: MMSE (MVR, MAP), may be POLC (contribution Benoit Neichel).
More advanced centroiding algorithm: weighted, ... (contribution Yann Clenet).
Cleaner display routines, separately callable.
Hexagonal stackarray geometry.
Hexagonal curvature WFS and DM geometry.
Phase screen generation on the fly (contribution Damien Gratadour and Eric Gendron).
Off-axis projectors for Rayleigh background (contribution Damien Gratadour).
Your code here :-)

September 29, 2009

Breaking the 1000 iterations/s barrier: a performance update

Is the performance of today's computer still following Moore's law? If so, if I take my numbers of the section performance, that date back to 2004, we should get for the fastest case (sh6m2-bench.par) 170*2^5 = 5540 it/sec. That is not the case, but I re-run recently some tests on typical cases and I am happy to announce that the threshold of 1000 iterations/second has been passed. Here are some performance numbers, obtained on a MacBook Pro 17" (running Snow Leopard, aka 10.6.1, 2.66GHz, 4GB RAM, 64bits yorick and plugins). All simulations were runs with 4 phase screens, and the display on, unless noted otherwise. Results are in iterations/second.

Name	case	64bits	32bits
shfast	SH 6x6 geometrical, 1DM+TT, NGS, 1 screen, no display	1020	860
test1	SH 12x12 physical 1DM+TT, LGS	44	32
test2	SH 12x12 geometrical, 1DM+TT, NGS	219	182
test3	SH 6x6 physical 1DM+TT, NGS	201	165
test4	CWFS36, NGS	226	164
test4bis	same, no display	261	184
sh20	SH 20x20 geometrical, 1DM+TT, NGS, 1 screen, no display	284	N/A
sh20_2	SH 20x20, physical, 1 screen, w/ display	59	N/A

Those are all, I believe, fairly impressive numbers. Because I acquired this macbook only recently, I was doing comparison between a 32bits and 64bits yorick. Hence the last column of the table. Building/running in 64bits actually makes a fairly big difference on Snow Leopard. In average, the 64 bit version is 25% faster than the 32 bits one (up to 42% faster in some cases). That's a big gain.

September 13, 2009

Install Update

Obviously, I have been postponing for far too long updates on this site. Yao is not dead, no. I and many other people are still using it almost daily. But I just didn't have the time nor the inclination to update this site.

I got several requests recently to help install yao on OsX (I believe the yao packages have been built for 10.5, so I don't think the following would work for Tiger 10.4). So, I wrote up some fairly detailled instructions. Here is the drill. Note that I do not believe that there is any pre-requisite (beside X11 of course). I tested that on a fresh test account in my mac and it worked. FFTW comes bundled with the yao package, so it is not necessary to install it.

Note: in fact, the following should also work on linux. Just select the appropriate OS in pkg_setup when asked for it.

Install the last yorick
- download yorick from http://sourceforge.net/projects/yorick/files/yorick/2.1.05/yorick-2.1.05-mac-intel.tgz/download
- unpack and move yorick-2.1.05 directory to ~
- add yorick path to your default path: add this line at the end of your ~/.profile
```
export PATH=$HOME/yorick-2.1.05/bin:$PATH
```
Install yao:
pkg_mngr.i is now bundled with yorick. So just do the following (for pkg_setup: all defaults are good, so carriage return everywhere, except you will have to select the OS: darwin-i686 on modern apple machines)
```
$ yorick
#include "pkg_mnr.i"
> pkg_setup
> pkg_sync
> pkg_list
> pkg_install,"yao"
> quit
```

Setup yao: First you'll have to create phase screens (once forever unless you want new ones):

$ cd ~
$ mkdir -p .yorick/data
$ cd .yorick/data
$ yorick -i turbulence.i
> require,"yao.i"
> CreatePhaseScreens,2048,256,prefix="screen"
> quit
$ ls   ( you should have your screens)

Run yao
```
$ cd ~/yorick-2.1.05/share/yao/examples/
```
That's where some example yao parfile are. you can then put them wherever you want and transform/adapt them at will.
```
$ yorick -i yao.i
> aoread,"sh6x6.par"
> aoinit,disp=1
> aoloop,disp=1
> go
```
watch it go :-)

4 December 2005

Version 3.6

Very, very long time I haven't posted here, or updated this site for that matter !

I know many people have had problems installing yao with the newest yorick releases. Since the yorick Makefile system has been re-hauled in fact. Well, many things have changed. Now you can install yao *very* easily from the yorick prompt. Just download the brand new binary plugin installer (require yorick > 1.6.02). To install the latest yao, just:

#include "pkg_mngr.i"
pkg_list
pkg_install,"yao"

beware that the install directive will install in yorick/contrib/yao, so if you have there an existing installation that you want to preserve, you should back it up before.

This yao version is kind of intermediary. I have not included all modifications from ralf. I have however made a fairly decent size change to aoloop:

Now aoloop is in 2 parts:

aoloop,parameters_as_before

to "load" the thing. then type

go

go,1

to run a single iteration.

While it's going, you now still have access to the prompt. You can type "stop" or "cont" to stop/resume the loop simulations.

If you write batches in which you run loops, you will have to write it like this:

aoloop,bla-bla-bla
for (j=1;j<=loop.niter;j++) go;

This is because a call to go goes only through one loop cycle, and then call "set_idler,go", which tells to the yorick interpreter "if you have nothing pending, run 'go'". This will work as intended when ran interactively, but when in a batch, the interpreter is not idle: it will execute directly the next line and will never go back to "go" if not forced to (hence the explicit loop).

18 October 2004

Version 3.5

This new version include the 2 following additional features

Ability to swap the phase screen during a simulation run. This comes on top of the "jump and reset" feature. In fact, it simply swaps (rotate) the screens every N resets (set this using loop.jumps2swapscreen, see the data-structures page).
At long last, I have figured out and implemented more realistic influence functions for stackarray (piezostack) deformable mirrors. One can set the coupling parameter of the influence functions. Acceptable values are from 4 to 30%.

2 August 2004

Version 3.3.2

Released today version 3.3.2, that include fixes to a few bugs

a division by zero was causing a SIGFPE for odd number of subaperture shack hartmann systems in the angle/elongation calculations in shwfs_init (bug noted by Miska)
Added a check of indices overflow when determining the Y interpolation points in get_turb_phase_init (also suggested by Miska)

28 July 2004

YAO SSMP

Ralf has been developing further his SSMP (Sparse Matrix Package) and a yao implementation. It looks very promising (quite faster for large systems!). Keep tuned.

Note added August 2: Indeed Ralf has been very active. He is in the final phases of testing and has been revamping major parts of yao to be compatible with the new reconstructor choices. Reconstructors will very soon include MAP, MAP+PCG, on top of the existing SVD/least square. Note that this will also include a sparse implementation, so it's fast !

28 July 2004

Rayleigh and photometry revamp

I've just been working on implementing the Rayleigh fratricide effect in yao. This is an important feature in systems using multiple LGS beams with CW lasers. A WFS looking at a star may intersect the rayleigh plume from another star and thus see an increased background. This effect is particularly important to model for MCAO as CW laser is a possibility. First, I wanted to branch yao, and do a quick and dirty implementation of the Rayleigh, not planning to release it. I reconsidered after seeing that it was not going to take that much more work to actually make it user-friendly, so here we are with version 3.3. I also went ahead and implemented a clean zenith dependance. Now there is a parameter (gs.zenithangle) that sets the zenith angle (see below). Beware that the definition of photometry has slightly changed: now the zero point is for the number of photons at the entrance pupil, not anymore detected by the WFS. An way to enter the optical throughput on a WFS basis is provided (see below).

Figure: A generated field of 120"x120" as viewed by a subaperture [1.5,1.0] m off-axis, showing the 5 laser guide stars in the MCAO configuration and their associated Rayleigh backscatter (Linear ITT).

Below is the excerpt of the README file relevant to this release:

semi major changes in this version. Watch out, as some parameters meaning has changed (I know this is silly, but it's more logical now).
Implemented zenith dependance. Now one can change one parameter (gs.zenithangle) and everything change as needed (r0, lgs altitude and thickness of Na layer, atm layer altitude, LGS brightness)
Imlemented Rayleigh fratricide effect for multiple LGS systems! This was quite an endeavour. I had to modify the _shwfs routine to include several new lines and parameters relative to Rayleigh and calibration (a side effect is that we can now do and use calibration frame in this routine). To enable Rayleigh calculations set wfs.rayleighflag to 1 (it will *not* be calculated for NGS WFS as it is assumed that you can easily block the light from the laser wavelength).
The way to specify the photometry has also changed. Now, if you deal with a LGS, you specify a power and a return per watt, not anymore a magnitude (magnitude are still ok for a NGS). Also, everything -in term of zeropoints- is now specified at the entrance of the telescope: gs.zeropoint is now the number of photons for a zero mag star per sec per pupil at M1 (it used to be detected by the WFS). gs.lgsreturnperwatt is also in photons at M1. Instead, I have added a wfs.optthroughput parameter to specify the optical throughput of each WFS. Please set it, as it defaults to 1.
In summary, the following keywords were added in the parfile:
- gs.lgsreturnperwatt
- gs.zenithangle
- wfs.laserpower
- wfs.rayleighflag
- wfs.optthroughput

07 June 2004

CVS and bug fixes

YAO is now under CVS. I have had Craig set up an account for me in the main Gemini CVS server (thanks Craig). This is purely for development though (only Ralf and myself have an account) and the main download is still from these pages.

We are now at version 3.2.3, which fixes a few bugs:

"~/Yorick/fftw_wisdom.dat" can now be created properly under linux.
I am now using convol instead of convVE for the creation of the influence functions when dm.elt=1, so this should work under linux too.

23 May 2004

Update

I have been presenting yao at the PSF reconstruction meeting in Victoria and several people have downloaded it.

Ralf submitted many bug reports/comments. Miska pointed out a bug in create_phase_screens, linked to SIGFPE being triggered by a division by zero. I did not see the problem on my G4 (I should have seen it on the G5 but did not bother going thru the phase screen creation phase there). I have patched the problem in turbulence.i and put the new release on the web pages.

30 April 2004

New features

In the new version 3.2.1, I have implemented sky noise and dark current, both for the SHWFS and the curvature WFS.

I have added some test parfiles in the examples directory (yao/examples). New users can test their distribution by running #include "test-all.i" in this directory. It's relatively fast, and test various configs (SHWFS method 1 and 2, with splitted subsystem, and a CWFS case).

6 April 2004

Update of the web site, cont'd

I have now updated almost all the pages, consistent with the new structure of yao-3.0. In particular, I have updated all the install instructions, the data structures and the main page. Still to complete is the "Example and scripts" and the screenshots pages. In parallel, I am still working at completing the new package structure. Everything (include files and compiled C routines) now resides in Y_SITE/contrib/yao/.

4 April 2004

Update of the web site

I have updated the web site, adding new performance results, and the weblog/news (this very page).

2004

Porting to FFTW!

Done ! I have relatively painleslly ported the fast (VE) C routines to using FFTW. It is almost as fast as the apple veclib (but not quite so, so I am keeping both). This means that YAO now runs on Linux! I have installed it on heze (2.8GHz Dual Athlon) and it gives nice results, although lono (Dual 2GHz G5) with the veclib FFTs is still faster (more details in performance page).