Wednesday, February 22, 2012

USIMM

We've just released the simulation infrastructure for the Memory Scheduling Championship (MSC), to be held at ISCA this year.

The central piece in the infrastructure is USIMM, the Utah SImulated Memory Module.  It reads in application traces, models the progression of application instructions through the reorder buffers of a multi-core processor, and manages the memory controller read/write queues.  Every memory cycle, USIMM checks various DRAM timing parameters to figure out the set of memory commands that can be issued in that cycle.  It then hands control to a scheduler function that picks a command from this candidate set.  An MSC contestant will only have to modify the scheduler function, i.e., restrict all of your changes to scheduler.c and scheduler.h.  This clean interface makes it very easy to produce basic schedulers.  Each of my students produced a simple scheduler in the matter of hours; these have been included in the code distribution as examples to help get one started.

In the coming weeks, we'll release a number of traces that will be used for the competition.  The initial distribution includes five short single-thread traces from PARSEC that people can use for initial testing.

The competition will be judged in three different tracks: performance, energy-delay-product (EDP), and performance-fairness-product (PFP).  The final results will be based on the most current version of USIMM, as of June 1st 2012.

We request that contestants focus on scheduling algorithms that are easily implementable, i.e., doable within a few processor cycles and within a 68 KB storage budget.  A program committee will evaluate the implementability of your algorithm, among other things.

We'll post updates and bug fixes in the comments section of this blog post as well as to the usimm-users@cs.utah.edu mailing list (sign up here).  Users are welcome to use the blog or mailing list to post their own suggestions, questions, or bug reports.  Email usimm@cs.utah.edu if you have a question for just the code developers.

Code Updated on 04/17/2012:

Code download: http://www.cs.utah.edu/~rajeev/usimm-v1.3.tar.gz

Changes in Version 1.1: http://www.cs.utah.edu/~rajeev/pubs/usimm-appA.pdf
Changes in Version 1.2: http://www.cs.utah.edu/~rajeev/pubs/usimm-appB.pdf
Changes in Version 1.3: http://www.cs.utah.edu/~rajeev/pubs/usimm-appC.pdf

USIMM Tech Report: http://www.cs.utah.edu/~rajeev/pubs/usimm.pdf

The contest website: http://www.cs.utah.edu/~rajeev/jwac12/

Users mailing list sign-up:  http://mailman.cs.utah.edu/mailman/listinfo/usimm-users

69 comments:

  1. Is the competition meant for individuals? Or are we allowed to participate in teams?

    ReplyDelete
  2. It'll take us about 2 more weeks to release traces. We want to make sure we're picking trace snapshots that are representative. If you're just looking for longer traces for testing, email us and we'll send you a link.

    On a related note, someone suggested that we add the PC for the cache miss in the trace. So in a week or two, we'll release version 1.1 that adds this support. If anyone would like other features added to version 1.1, please let us know soon.

    ReplyDelete
    Replies
    1. Thank you for organizing very interesting workshop. I have requests for new version. Please support following two features if you think these features are appropriate for the championship.

      (Feature 1) Read / write with auto-precharge command
      Current USIMM implementation does not support DDR3 auto-precharge feature. However, it helps to reduce the command traffic and improves the performance of closed page policy.

      (Feature 2) Speculative activation
      I want to activate specified rows speculatively because speculative activations help to reduce memory access latency significantly. Could you please add following 2 APIs for speculative activation?
      1. int is_activate_allowed(int channel, int rank, int bank)
      2. int issue_activate_command(int channel, int rank, int bank, long long int row)

      Delete
  3. Yasuo, these are excellent suggestions. We'll try to include these in version 1.1.

    ReplyDelete
  4. The PDP metric, as currently defined, is vulnerable to non-intuitive behavior and perhaps even gaming. I provide one example below.

    Let us consider a two-way multiprogrammed workload W, which consists of benchmarks X and Y.

    /* Benchmark Characteristics */
    X's delay (running by itself): 1
    Y's delay (running by itself): 1

    /* Workload W + Scheduler A */
    X's delay: 2
    Y's delay: 2

    /* Workload W + Scheduler B */
    X's delay: 1
    Y's delay: 2

    Between the two, Scheduler B clearly dominates Scheduler A. This is because Scheduler A accelerates X (2->1), while not affecting Y (2->2). However, the PFP (performance-fairness product) metric tells a different story.

    /* Workload W + Scheduler A */
    "Performance": 2+2 = 4 (lower is better)
    Fairness: 2/2 = 1 (higher is better)
    PFP: 4/1 = 4 (lower is better)

    /* Workload W + Scheduler B */
    "Performance" : 1+2 = 3 (lower is better)
    Fairness: 1/2 = 0.5 (higher is better)
    PFP: 3/0.5 = 6 (lower is better)

    Here we see that PFP_A < PFP_B, denoting that Scheduler A is somehow better than B, when that is definitely not the case.

    ReplyDelete
  5. After some thinking, at least I haven't been able to come up with a good alternative metric that captures both performance and fairness using a single number.

    Instead of using a single number, one option is to define a ("performance", fairness)-tuple for each workload. Using the same example as in my previous comment, the tuples for the two schedulers would be the following.

    /* Scheduler A */
    ("performance", maximum slowdown) = (4, 2)

    /* Scheduler B */
    ("performance", maximum slowdown) = (3, 2)

    I've used the maximum slowdown as the metric for fairness. This is because I think it's more robust than the min-to-max slowdown that's susceptible to high fluctuation since it relies on the two most outlying benchmarks within a workload. (The CAL'11 paper by Vandierendonck and Seznec also agrees, but for different reasons.)

    To compare the two schedulers, you would do an element-wise comparison of the tuples. For "performance" (lower is better), Scheduler B is awarded one point while, for fairness, there is a tie so both schedulers receive zero points. To compare the two schedulers for multiple workloads, you would do the same workload-by-workload comparison between the two schedulers and sum up all the points to determine the winner.

    This is just my suggestion. Other people may have better ideas.

    ReplyDelete
  6. Yoongu, thanks. I agree with your point that PFP (as defined) is not appropriate. So we will change the metric for the third track. I'm hoping this discussion will help us converge on an appropriate alternative metric within a week or two.

    PFP (as defined) turns out to be a bad metric because doing well on some program ends up hurting the fairness metric and overall PFP. So you're almost discouraged from doing too well on some program. Like you suggest, let's change the fairness metric from min-to-max slowdown to just the max slowdown. That change would restore sanity to the PFP metric :-). Do you see any problem with this new measure (product of overall performance and the performance of the most affected program)?

    The element-wise comparison of tuples would not work well when comparing (say) five different schedulers in your next paper. But.. it could work in a memory scheduling competition. We could do a March-Madness-style bracket where each scheduler has a match with one other scheduler for the right to advance. :-) I'm only half-serious here.

    ReplyDelete
    Replies
    1. Hi Rajeev. I never thought of it that way, as March-Madness :) Anyway, the new metric seem to be more robust. At least I can't think of an obvious exploit to it. Except, I think you would need to ensure that the single-thread execution latency of all the applications are more or less the same. Otherwise, it may degenerate into shortest-job-first, where you assign the highest priority to the application with the lowest single-thread execution latency, the next highest priority to the next thread, so on and so forth. This is because when trying to minimize maximum slowdown, the "shortest" application would have the lowest value in the denominator (single-thread latency), so you really want to make sure that the numerator (multiprogrammed latency) has a low value as well.

      Delete
  7. According to configuration in published manual, it seems to be confined X4 org. for 1 channel case. is there any special reason for confining X4 org for 1 channel case? It is not about tool but about just curiosity. Thank you in advance!

    ReplyDelete
    Replies
    1. To support 16 GB capacity on a 2-rank channel, we had to use 16 x4 4Gb chips per rank -- we were restricted to using the chips supported by the Micron power calculator. For the lower-capacity configurations, we had more options. It was a somewhat arbitrary choice to stick to x4 chips to minimize the number of variables that were changing.

      Since we do include power models for a x8 and a x16 chip, the user can code up alternative configurations. The competition will only consider the configurations listed in Table 2 of the tech report.

      Delete
  8. Hi,
    From the code it appears that there are no multiple memory controllers. Is it planned in the upcoming versions of the simulator? or is the competition only for single memory controller?

    ReplyDelete
    Replies
    1. Raghavendra, we do support multiple memory controllers. Each channel is controlled independently, so the scheduler for each channel can be viewed as a separate memory controller. We'll be using two configs in the competition, one of which is 4channel.cfg. So the competition will evaluate the case with 4 memory controllers.

      Delete
  9. Hi,
    I've downloaded 1G trace successfully, but I realized that four 1G-traces is too heavy to be simulated.
    For the one 1G-trace case, it does not make a problem.
    However, while simulation with four 1G-fluid traces finishes in 37 min, simulation with four 1G-black traces does not finishes(my workstation performs 3-days along and it still works now).

    It is reasonable to use a 100M,10M,1M traces in real competition.
    In addition, 1G-trace and 100k-trace looks similar in terms of read density, write density, read hit ratio, write hit ratio,..(except face case).

    ReplyDelete
    Replies
    1. I didn't have any trouble with my experiments, although, I don't think I tried the case with four 1G-black traces. We'll check it out.

      Delete
    2. I only tested on version 1.1. My experiment with four blackscholes traces (1 billion instrs each), 1channel.cfg, and the FCFS scheduler finished in 16 minutes. If your problem persists, let us know (best to email usimm@cs.utah.edu with details about your simulation).

      Delete
    3. Hi,

      We were given 1G traces that worked with the older usimm version by our professor and I also encountered the same problem. Few other students of our class had the same issue. So I think older usimm version has this problem for sure. I tried with the newer one and as you said it worked in less than 15 minutes. (I haven't tested other cases though).

      Also, usimm seems to take a lot of memory space or is it a leak in the code.

      Delete
    4. Hi Minhaj,

      Thanks for pointing out the memory requirements issue. We will look into it.

      Delete
    5. Hi, simulation time problem occurs w/ old version,
      However, in the v1.1, simulation time is under 15min with 4-core test.
      Thanks for releasing 1.1.

      Delete
  10. Version 1.1 is available. The switch to 1.1 should be quite straightforward -- simply copy your scheduler.c and scheduler.h into the new src/ directory. The tool will take longer to download because the input/ directory includes five billion-instruction traces.

    A summary of changes can be found here.

    The tool itself can be downloaded here. (190 MB)

    ReplyDelete
  11. Hi, Are we allowed to change only scheduler.c and scheduler.h for the competition? Or can we make changes in other files if required by our algorithm?

    ReplyDelete
    Replies
    1. Hi Ram,

      For the competition, contestants are allowed to change just the two files you mentioned. The other files contain models for the processor and the DRAM timing which should not be modified. The scheduler just needs to pick a valid scheduling candidate. We have exposed most of the important variables in the simulator (from the processor and memory controller structures), so scheduling decisions can be based on a combination of those.

      Let us know if you have any more questions.

      Delete
    2. Hi,
      Are we allowed to define/initialize variables in other files which act as hardware structures for our algorithm. Like adding a variable to the read queue in memory_controller.c?
      Also if our algorithm wants to change priorities while adding an entry, can the code be added in insert_read function (memory_controller.c)?

      Thank you.

      Delete
    3. Hi Ram,

      You should not change the files other than scheduler.c and scheduler.h. scheduler.h declares an init function that you can use to initialize your own variables.
      Also, note that there's a user_ptr field in the request_t structure (i.e. read queue entry) which can be used to tag the read_queue entries. In the schedule function() you can check if that field is initialized or not and assign your own values. Hope this helps.

      Delete
    4. Hi Niladrish,
      The request_t has a user_ptr field which can be used. But the initialization of which can be done only whenever init_new_node is called by insert_read, both in memory_controller.c. So if we are not supposed to modify the other file, is there any way I can initialize the new read request entry to hold some value which my algorithm uses? (the init_scheduler_vars as far as I understand, is called once and not when a new entry is created)

      Delete
    5. Hi Ram,

      You are right about the init_scheduler_vars() function - it is called only once and hence can't be used for tagging the read-requests. But, in schedule() (which gets called every memory cycle), you can check to see read requests which do not have their user_ptr field set and identify them as new requests (or alternatively check their arrival time against the current time to figure out the same). Then you can attach your own data structure to each of these read_requests.

      The other possible solution is to mirror the read queue in your own data structure which you update (add or modify) in schedule() every cycle and remove entries from the queue after you issue the last command for that entry.

      Delete
    6. Hi Niladrish,
      Sorry for bugging you again.
      The user_ptr field is not initialized to NULL. So any check to find out if that is not initialized in scheduler.c is returning in seg fault at random intervals if we use free. The error does not repeat if we initialize user_ptr = NULL in init_new_node present in memory_controller.c. Is this allowed?

      Delete
    7. We will include that in the next usimm release, so for the time being, you can make that change in memory_controller.c and go forward with that. Later, you can replace the existing memory_controller.c with the new version of that file and things will move smoothly. Sorry for the inconvenience.

      Delete
  12. I'm very sad to tell this:
    my workstation has 8G memory and it fails when running four blackscholes_1G, or four canneal_1G with 1.1 version.
    I think that init_new_node() function makes this problem, because init_new_node() is called from insert_read(), insert_write() and it does not free "request_t *new_node" at all.
    So, as the # read/write count increases, memory is filled with huge write and read commands without de-allocating. (black's read+write count is 7.4 million and canneal's count is 17.9 million, while body is 4.2 million, fluid is 4.8 million, and freq is 5.6 million.)
    I think that no workstation would simulate eight canneal.
    Looks like de-allocation part is needed in the code.

    ReplyDelete
  13. Hi stone moon,

    The issue can be fixed by inserting a couple of calls to free in memory_controller.c.

    Insert the following call to free() after the macro invocation LL_DELETE(read_queue_head[channel],rd_ptr)
    on line 641 in memory_controller.c

    free(rd_ptr);


    Similarly after LL_DELETE(write_queue_head[channel],wrt_ptr)) on line 657, insert a call to free as follows.

    free(wrt_ptr);

    This should take care of deallocating the request nodes. We will include this fix in the next release.

    Sorry for the inconvenience and thanks for highlighting the problem.

    ReplyDelete
    Replies
    1. Hi Niladrish Chatterjee,
      Thank you for your advice,
      I've applied to source code for your suggestion,
      and it works well :D (usimm only takes 0.1% of total memory when running)

      Thank you again.

      Delete
  14. Hello sir, i got an error while using scheduler.c(adapted by scheduler-fair.c) in usimm-v1.1
    the message is:
    usimm: memory_controller.c:1540: update_memory: Assertion `is_refresh_allowed(channel, rank)' failed.
    please let me know why this error occurred.

    ReplyDelete
    Replies
    1. Hi chungchung,

      Can you please email your scheduler.c and scheduler.h (assuming those are the only files that you have changed) plus the command-line you used for this run to usimm@cs.utah.edu .
      This will help us reproduce the problem.

      Thanks,
      Niladrish

      Delete
    2. Hi chungchung,

      We went through the code and could not spot or reproduce the problem. Can you send us more details about the scheduler and inputs you used if the problem persists ?

      Delete
    3. Hi Chatterjee,
      I'm sorry..
      I removed the file and redownload the usimm file

      and It went well ~

      I'm very sorry and thank you for your attention.

      Delete
  15. Hi,
    I am getting "process killed" for all the 1 channel configuration which takes 4 trace files. The simulation for 1 channel with 1 trace file is running fine.

    ReplyDelete
    Replies
    1. If this is because of a memory leak, please apply Nil's fix above.

      If not, confirm that the input/ directory has all the original files. If the problem persists, please send email to usimm@cs.utah.edu with details about the command line, output, and the scheduler.c/h files.

      Delete
    2. thanks.

      For the performance track, do we need to compare the sum total of all the completion times from all the cores or only the final completion time of the entire simulation irrespective of core number

      Delete
    3. The metric in the performance track will be the sum of completion times for all cores, not just the final completion time for a workload.

      Delete
  16. Hi, I found some bugs.
    scheduler-refr.c does not operate normally in all traces that you had distributed - in 1-CH configuration, "body" trace works only while "black", "can", "fluid", "freq" fails due to assertion

    fail.
    I tried to debug this situation and figured out reason.

    During the simulation, "last_refresh_completion_deadline" and "next_refresh_completion_deadline" are shifted together by the amount of "8*tREFI".
    And "refresh_issue_deadline" is updated to "next_refresh_completion_deadline-tRP-tRFC*(8-num_issued_refresh)" at every DRAM cycle.
    Here is the bad situation.
    When 8-refresh is issued in advance, "refresh_issue_deadline" equals to "next_refresh_completion_deadline-tRP".
    If last(8th) refresh command is issued at "next_refresh_completion_deadline-tRP-tRFC/2",
    "dram_state.next_refresh" is updated to "next_refresh_completion_deadline-tRP+tRFC/2" which makes an assertion fail
    At "next_refresh_completion_deadline-tRP", "is_refresh_allowed" function returns 0 because "CYCLE_VAL < dram_state.next_refresh"
    (CYCLE_VAL = next_refresh_completion_deadline-tRP, dram_state.next_refresh=next_refresh_completion_deadline-tRP+tRFC/2)

    Additional code is essential to avoid this problem.(in memory_controller.c)
    At line 1547 in memory_controller.c,
    "else if(CYCLE_VAL == refresh_issue_deadline[channel][rank])"
    should be changed to
    "else if( (CYCLE_VAL==refresh_issue_deadline[channel][rank]) && (num_issued_refreshes[channel][rank]<8) )"
    , in order to skip timing check when 8 refreshes are issued in 8*tREFI window.

    Surely, I can avoid this situation in scheduler-refr.c by restricting refresh at
    "CYCLE_VAL > next_refresh_completion_deadline-(8-num_issued_refresh)*tRFC-tRP".
    However, I don't want that conservative situation which limits performance improvement.

    It would be very helpful if you consider this situation when you release usimm1.2

    Thank you.

    ReplyDelete
    Replies
    1. Hi stone moon,

      Thanks for reporting this bug. We will look into it. Can you send us the exact commandline to reproduce this bug.

      Sorry for the inconvenience.

      Thanks,
      Niladrish

      Delete
    2. Hi Niladrish Chatterjee,

      I just ran with basic code and command.

      1. compile scheduler-refr.c and scheduler-refr.h in "src" directory by changing the Makefile

      2. run in home directory
      bin/usimm input/1channel.cfg input/blackscholes_1G.input_1.1

      You can see the assertion fail 403353592 (processor cycle)

      Delete
  17. Version 1.2 is now available. Download the tool and copy your own scheduler.c and scheduler.h into the src/ directory. This version fixes the memory leaks and bugs that have been reported in the above comments (thanks to everyone for reporting the bugs and fixes!). The only files that have changed are memory_controller.c and our license file.

    A summary of changes can be found here .

    The tool itself can be downloaded here . (190 MB)

    ReplyDelete
  18. Hi,
    Thank you for releasing such a valuable simulator.

    I got an assertion error I when I use autoprecharge command on usimm v1.2.

    usimm: memory_controller.c:1548: update_memory: Assertion `is_refresh_allowed(channel, rank)' failed.

    In is_autoprecharge_allowed(), interval time after autoprecharge is calculated only based on CYCLE_VAL; however, actual interval time can be based on dram_state[channel][rank][bank].next_pre in issue_autoprecharge().
    Differently from the other commands, the autoprecharge command can add T_RP to dram_state[channel][rank][bank].next_*.
    Therefore, in some cases, it causes a violation of refresh_issue_deadline.

    I guess the following is_autoprecharge_allowed() is correct:

    int is_autoprecharge_allowed(int channel, int rank, int bank) {
    if (((cas_issued_current_cycle[channel] == 1) && ((max(CYCLE_VAL + T_RTP, dram_state[channel][rank][bank].next_pre) + T_RP) <= refresh_issue_deadline[channel][rank]))
    || ((cas_issued_current_cycle[channel] == 2) && ((max(CYCLE_VAL + T_CWD + T_DATA_TRANS + T_WR, dram_state[channel][rank][bank].next_pre) + T_RP) <= refresh_issue_deadline[channel][rank])))
    return 1;
    else
    return 0;
    }

    or

    int is_autoprecharge_allowed(int channel, int rank, int bank) {
    if (((cas_issued_current_cycle[channel] == 1) && (dram_state[channel][rank][bank].next_pre + T_RP) <= refresh_issue_deadline[channel][rank]))
    || ((cas_issued_current_cycle[channel] == 2) && (dram_state[channel][rank][bank].next_pre + T_RP) <= refresh_issue_deadline[channel][rank])))
    return 1;
    else
    return 0;
    }


    Thank you,

    Keisuke KUROYANAGI

    ReplyDelete
    Replies
    1. Thanks for being so quick to point this out (also Kouhei Hosokawa, who emailed the usimm group). We've updated the v1.2 web link with the fixed code. Apologies for the oversight and resulting confusion!

      Delete
  19. Hi,
    Can we implement closed page policy by modifying only the scheduler.c. The simulator is using an open page policy. What should we do from the scheduler function so that we can selectively close some of the pages just after read cmd is sent? -- Any pointers for this
    Thank you.

    ReplyDelete
    Replies
    1. Hi Ram,

      There are two options for precharging a bank after issuing a read or write.
      The first option is to issue an explicit precharge command using the issue_precharge_command function. For this command to be successful, enough time has to pass after a read or write for a bank to be ready for a precharge (which can be checked by the is_precharge_allowed function).

      The second option, which is the one that emulates the READ_AUTO_PRE command functionality offered by memory controllers, is to issue an autoprecharge command on the same cycle when you issue a read or a write. This can be achieved by calling the issue_autoprecharge_command the very same cycle that a COL_RD or COL_WR is issued through the issue_request_command function. This command will effectively close the row after the current read or write is performed and you don't have to spend an extra command bus cycle to send an explicit precharge or wait till the precharge conditions are met before issuing the explicit precharge command. (see appendix A).

      Thanks,
      Niladrish

      Delete
  20. Hi,
    Thank you for your reply. I have one more query. Added a few lines in FCFS scheduler to print the contents in the queue and the entry which was scheduled(printed inside (())). We checked it for Bank 4. There was an issue

    Bank 4 Row 38104 ThreadID 1 is_issuable 0 COL_READ_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 1 ACT_CMD
    Bank 4 Row 100125 ThreadID 3 is_issuable 0 PRE_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 1 ACT_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 1 ACT_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 1 ACT_CMD

    -------------------------------------------
    ((Bank 4 Row 39795 ThreadID 1 is_issuable 1 ACT_CMD))

    -----------------------------------------------------------------
    Bank 4 Row 38104 ThreadID 1 is_issuable 0 COL_READ_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 0 COL_READ_CMD
    Bank 4 Row 100125 ThreadID 3 is_issuable 1 PRE_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 0 COL_READ_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 0 PRE_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 0 PRE_CMD

    -------------------------------------------
    ((Bank 4 Row 100125 ThreadID 3 is_issuable 1 PRE_CMD))

    -----------------------------------------------------------------

    Bank 4 Row 38104 ThreadID 1 is_issuable 1 ACT_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 1 COL_READ_CMD
    Bank 4 Row 100125 ThreadID 3 is_issuable 1 ACT_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 1 COL_READ_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 0 PRE_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 0 PRE_CMD

    -------------------------------------------
    ((Bank 4 Row 38104 ThreadID 1 is_issuable 1 ACT_CMD))

    -----------------------------------------------------------------

    Bank 4 Row 38104 ThreadID 1 is_issuable 0 COL_READ_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 1 COL_READ_CMD
    Bank 4 Row 100125 ThreadID 3 is_issuable 0 PRE_CMD
    Bank 4 Row 39795 ThreadID 1 is_issuable 1 COL_READ_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 0 PRE_CMD
    Bank 4 Row 66491 ThreadID 2 is_issuable 0 PRE_CMD

    -------------------------------------------
    ((Bank 4 Row 39795 ThreadID 1 is_issuable 1 COL_READ_CMD))

    -----------------------------------------------------------------

    Row 39795 was sent ACT_CMD and then Row 100125 was sent PRE_CMD, Row 38104 ACT_CMD, followed by again Row 39795 with COL_READ_CMD. Should that not be PRE_CMD? Is there anything wrong in our interpretation?

    ReplyDelete
    Replies
    1. Ram, note that each rank has a "Bank 4", so maybe that's leading to the confusion. If these stats pertain to a single rank, please email us the command line and other details for your simulation so we can reproduce the possibly unexpected behavior.

      Delete
  21. Hi,
    Is allowed to communicate with global variable between DRAM controllers? (in 'scheduler.c')
    How can I calculate communication time overhead?

    ReplyDelete
    Replies
    1. Yes, you can add variables in scheduler.c to exchange values between memory controllers.

      The program committee will evaluate if your algorithm is "implementable". It's best to make a reasonable assumption for communication overhead between controllers, e.g., a 15mm x 15mm chip and global wire speeds.

      Delete
  22. Hi, Rajeev.
    I have a question.
    How can I submit source code and paper on due date?
    Via e-mail?

    ReplyDelete
    Replies
    1. We're in the process of setting up the submission site. Most likely, you'll submit the pdf on the submission site and the code via email. Stay tuned for submission details.

      We'll also release version 1.3 shortly (very minor code changes). We'll also be releasing more (but not all) traces. Consistent with previous JWACs, contestants will only see a subset of the workload before the submission deadline.

      Delete
  23. The program committee for the competition is now posted on the competition page. Since there were a few questions about this, I thought I'd clarify the role of the PC.

    The competition winners will be determined entirely by the numbers produced by the simulation experiments. The PC will check the following criteria and provide feedback that may help authors with future submissions of their work. Criteria that must be fulfilled to qualify for the competition: 1. The algorithm must be "implementable" on modern hardware. This includes meeting the storage budget of 68 KB and being computationally tractable. 2. The paper must offer new insight beyond published work. This could either be a new scheduling algorithm, an effective combination of known scheduling heuristics, or the authors' own previously published algorithm analyzed on the new simulation infrastructure.

    Please email me if there is any confusion about the competition rules.

    ReplyDelete
    Replies
    1. P.S. The PC can also recommend acceptance for a paper that may not have the best results, but that provides significant new insight.

      Delete
  24. Version 1.3 is now available here (378 MB).

    This is hopefully our final release before the competition. The code changes in version 1.3 are minor. The most significant addition is the set of workloads that will be used for the competition submissions. Details on version 1.3 additions can be found here . Please read it carefully.

    In the next day or two, we'll also release files that will facilitate quantifying and reporting the final competition metrics.

    As always, please email usimm@cs.utah.edu if you have any questions or feedback.

    ReplyDelete
  25. Hi,
    Can I send both paper and source code via e-mail?
    Because of my company's security policy, sending e-mail is easier than upload files.

    ReplyDelete
  26. Here is a perl script to make it easier to compute final metrics and graph results. (Note that you'll have to rename the file to "usimm-script.pl")

    The perl script reads your "runsim" script, finds your output files in an output/ directory, and produces the following:

    1. The total execution time, PFP, and EDP metrics on stdout.

    2. A csv file that can be read by excel and that allows you to graph the performance and EDP numbers.

    Please read the perl script documentation for more details, especially the naming convention for output files. I hope people find this useful. As always, contact usimm@cs.utah.edu if you have any questions or suggestions.

    (Thanks to Manju Shevgoor for writing this script.)

    ReplyDelete
  27. The submission site for the MSC is now open. The submission deadline is Tuesday April 24th, 9pm PDT (Pacific daylight savings time). Contestants must submit a 6-page conference-style paper pdf on the site and email their scheduler.c/h to nil@cs.utah.edu.

    Please email me if you have any questions about the process.

    ReplyDelete
  28. Here are links to a sample results table: pdf and tex . Submissions are not required to include such a table. But I suspect many will -- hopefully, this template will reduce effort.

    ReplyDelete
  29. Please note that blind submissions are allowed, but not required. If you're submitting a minor extension of your own published scheduler, it probably makes sense to identify yourself in the submission.

    ReplyDelete
  30. First, thanks for this useful tool to understand the DRAM internals and experiment with different scheduling policy.

    I have a question regarding the trace format. I plan to use PIN instrumentation to generate the trace. The format of the trace specifies that each memory instruction is displaced from preceeding memory instruction by some value >= 0. For a gap of 0, two instructions should be fed to the ROB next to each other. But for gap > 0, second memory instruction should be fed only after elapsed CPU cycle. I do understand that for out-of-order execution it's possible. But, for a large gap (say 1000+), small instruction window can't capture this. However, it seems the fetching engine is disregarding this in current implementation.

    Below code snippet (main.c) shows this:
    while ((num_fetch < MAX_FETCH) && (ROB[numc].inflight != ROBSIZE) && (!writeqfull))
    {
    /* Keep fetching until fetch width or ROB capacity or WriteQ are fully consumed. */
    }

    Thanks in advance for your suggestion.

    ReplyDelete
    Replies
    1. If there is a large gap between two memory instructions, it'll take many cycles before the second instruction can even enter the ROB. The code does show that the ROB is advanced MAX_FETCH at a time, while never exceeding the ROB size or write buffer size. I'm not sure what you're confused by and what you feel is being disregarded.

      Delete
  31. Thank you for this simulator. I am trying to model an approximate DRAM using USIMM. I'd like to start off by modifying a few refresh parameters. I went through the params.h file but did not find numbers for these there. How do I go about doing this? Thanks!

    ReplyDelete
    Replies
    1. Bhaskar, most of the DRAM timing parameters can be found in input/*vi. There's a separate file for each DRAM chip type.

      Delete
  32. Hi. I have a question about workload. I don't understand how do you make workload by Simpoint. I think Simpoint don't print addresses, Did you fix Simpoint's code or use another tool. If you use another tool, please let me know what is it. Thank you.

    ReplyDelete
    Replies
    1. We use methodology similar to that of Simpoint. The traces were ultimately produced by Simics, but you can also use other tools like Pin.

      Delete